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Abstract: Suppose, contrary to fact, in 1950, we had put the cohort of 
18 year old non-smoking American men on a stringent mandatory diet that 
guaranteed that no one would ever weigh more than their baseline weight estab- 
lished at age 18. How would the counter-factual mortality of these 18 year olds 
have compared to their actual observed mortality through 2007? We describe 
in detail how this counterfactual contrast could be estimated from longitudinal 
epidemiologic data similiar to that stored in the electronic medical records of a 
large HMO by applying g-estimation to a novel structural nested model. Our 
analytic approach differs from any alternative approach in that in that, in the 
abscence of model misspecification, it can successfully adjust for (i) measured 
time-varying confoundcrs such as exercise, hypertension and diabetes that are 
simultaneously intermediate variables on the causal pathway from weight gain to 
death and determinants of future weight gain, (ii) unmeasured confounding by 
undiagnosed preclinical disease (i.e reverse causation) that can cause both poor 
weight gain and premature mortality [provided an upper bound can be specified 
for the maximum length of time a subject may suffer from a subclinical illness 
severe enough to affect his weight without the illness becomes clinically mani- 
fest], and (iii) the prescence of particular identifiable subgroups, such as those 
suffering from serious renal, liver, pulmonary, and/or cardiac disease, in whom 
confounding by unmeasured prognostic factors so severe as to render useless any 
attempt at direct analytic adjustment. However (ii) and (iii) limit the ability to 
empirically test whether the structural nested model is misspecified. The other 
two g-methods - parametric g-computation algorithm and inverse probability of 
treatment weighted (IPTW) estimation of maginal structural models (MSMs) 
can adjust for potential bias due to (i) but not due to (ii) or (iii). 

Key words: BMI, confounders, G-estimation, reverse causation, structural 
nested failure time model 



1 Introduction 

Suppose, contrary to fact, in 1950, we had put the cohort of 18 year old non- 
smoking American men on a stringent mandatory diet that guaranteed that no 
one would ever weigh more than their baseline weight established at age 18. 
Specifically, each subject was weighed every day starting on the day before his 
18th birthday. Whenever his weight was greater than or equal to this baseline 
weight, the subject's caloric intake was restricted, without changing his usual 
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mix of calorie sources and micronutrients, until the time (usually within 1-3 
days) that the subject fell to below baseline weight. (I restrict to men solely 
to avoid the complicating issue of how much weight gain to allow during preg- 
nancy.) Thus, ignoring errors of a pound or two, no subject would ever weigh 
more than his baseline weight. No instructions or restrictions were given con- 
cerning exercise at any time or the amount or nature of what the subject ate 
during non-calorie restricted periods. How would the counter-factual mortality 
of these 18 year olds have compared to the actual observed mortality through 
2007. 

Factually, a substantial fraction of 18 year old American male gains more 
than 30 lbs from age 18 to 74. Thus if the counterfactual mortality were much 
less than the observed mortality, then, it would make sense for individuals to 
maintain their baseline body weight by restricting caloric intake (regardless of 
whether or not a practical, non-mandatory public health intervention exists that 
would successfully maintain the baseline weight of most of the (non-smoking) 
US population.) Here and throught we use the phrase " maintain their age 
x bodyweight" to mean that after age x a subject's weight never exceeds his 
weight at age x, although it may drop below that weight. 

The difference between the counterfactual mortality were no one to exceed 
their age 18 body weight and the actual observed mortality of the non-smoking 
US population has been discussed by Willett et al (1) as a useful way to con- 
ceptualize the effect of weight on mortality. A major goal of this paper is to 
show that g-estimation of a novel structural nested model (SNM) can be used to 
directly estimate this difference from longitudinal observational data. A SNM 
is a model that takes as input a subject's observed outcome in their observed 
exposure (here, weight) history, and an unknown parameter and outputs the re- 
sponse that would have been observed if, contrary to fact, the subject to follow 
the stringent mandatory diet described above. The unknown parameter vector 
of a SNM is estimated via the g-estimation procedure introduced in Robins et al 
(12). Previous analytic approaches to the estimation of the effect of weight on 
mortality do not provide a direct estimate of this difference. In addition, previ- 
ous approaches have suffered from one or more of the following sources of bias : 
(i) failure to adequately control for measured confounding due to time-varying 
exercise, blood lipids, blood pressure, diabetes, and other chronic diseases (once 
diagnosed) because of concerns that one will thereby be controlling for inter- 
mediate variables on the causal pathway from overweight to death, (ii) failure 
to adequately control for unmeasured confounding due to undiagnosed chronic 
disease such as cancer (i.e., reverse causation) and (hi) failure to update the 
weight of a subject whose weight changes after start of follow-up, because of 
concerns about reverse causation and measurement error. 

Bias due to confounding by measured time-varying confounders that are 
also intermediate variables can be controlled by the use so-called of g-methods. 
G - methods are statistical methods specifically designed to control bias at- 
tributable to time-varying confounders affected by previous exposure. In addi- 
tion to g-estimation of structural nested models, g-methods include the paramet- 
ric g-formula estimator and inverse probability of treatment weighted (IPTW) 
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estimators (7,15). As yet g-methods have not been used to estimate the effect 
of overweight on obesity with the exception of Ref. (2), where the parametric 
g-formula estimator was used. In this paper we concentrate on g-estimation of 
SNMs, because as discussed below, of the three g-methods, only g-estimation 
of SNMs can adjust for unmeasured confounding due to undiagnosed chronic 
disease. 

Finally g-estimation of SNMs allows one to update the weight of a subject 
whose weight changes after start of follow-up without introducing any bias due 
to reverse causation. However issues of measurement error are more tricky and 
will be discussed in the final section of the paper. 

Even if maintenance of age 18 weight improves mortality, perhaps a manda- 
tory intervention that allowed weight gain of 0.3/12 pounds per month (i.e., 
3 pounds per decade) would produce an even lower mortality. Perhaps the 
mandatory intervention that would produce the lowest mortality ( i.e., the op- 
timal intervention among all "weight-gain" interventions) is one that allows a 
weight gain of 0.3/12 pounds pounds per month in subjects free of hyperten- 
sion, diabetes, hyperlipidemia, or clinical CHD, but of only 0.1/12 pounds per 
two month (i.e., 1 pound per decade) once a subject developed one of these risk 
factors. 

To decide which mandatory intervention is optimal, we require a well-defined 
numerical measure of overall mortality that can be used to rank interventions. 
For example, one might use the total years of life (or quality adjusted life) expe- 
rienced by the cohort from 1950-2007 as a measure. Use of this measure is math- 
ematically equivalent to the use of "years (or quality-adjusted years) of life lived 
from 1950-2007" as the (subject -specific) utility function in a decision problem 
whose goal is to maximize expected utility. "Years (or quality-adjusted years) 
of life lived" measures have a much more natural and useful public health and 
policy interpretation than the rate ratio, attributable fraction, and attributable 
risk measures routinely reported in epidemiologic studies. 

However even "years of life lived from 1950-2007" is an inadequate utility 
function when follow-up of the cohort is not to extinction. This function inap- 
propriately assigns the same utility not only to all subjects alive at age 74 on 
Jan 1, 2008 regardless of their state of health, but also to a subject who dies 
on Dec. 31, 2007 at 11:59 pm. Clearly among survivors in 2008, the healthier 
ones (according to some agreed on standard measure of current health ) have 
a greater post-study expected survival (and thus warrant a higher utility) than 
the less healthy survivors and a much greater expected survival (and thus war- 
rant a much higher utility) than the non-survivors who died in late December 
2007. We will not discuss further precisely how to decide on an appropriate 
utility measure for the survivors, except to remark that such a discussion is 
necessary. Rather, we will simply assume that, at the end of follow-up, each 
cohort member has been given a utility measure Y. 

Note that the benefit of any of the above counterfactual interventions is an 
overall effect of the intervention. For example it is conceivable that the mortality 
benefit of the intervention that maintained baseline weight was wholly due to 
changes in exercise. Perhaps maintenance of baseline weight makes individuals 
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feel so much better that they exercise more. 

In section 2, we assume we have observational retrospective follow-up data 
through 2007 on a random sample of the cohort of US males who were non- 
smokers and 18 in 1950. The data includes detailed medical records, analogous 
to those currently available on subscribers to a comprehensive HMO. In Sec. 
2.2,1 discuss three major sources of potential bias that complicate any attempt 
to estimate the overall effect of the mandatory intervention "maintain base- 
line weight" on the expected utility of our cohort: (i) measured time- varying 
confounders such as exercise, hypertension and diabetes that are potentially 
intermediate variables, (ii) unmeasured confounding by undiagnosed preclinical 
disease (i.e reverse causation) that can cause both poor weight gain and pre- 
mature mortality, and (iii) the prescence of particular identifiable subgroups, 
such as those suffering from serious renal, liver, pulmonary, and/or cardiac dis- 
ease, in whom confounding by unmeasured prognostic factors is so severe as to 
render useless direct analytic adjustment for confounding. In Section 3, I de- 
scribe how g-estimation of a correctly specified SNM can appropriately adjust 
for these potential sources of bias, [provided an upper bound can be specified 
for the maximum length of time a subject may suffer from a subclinical illness 
severe enough to affect his weight before the illness becomes clinically manifest] . 
The SNM required for this adjustment is novel in two ways. First it is a joint 
SNM, combining a structural nested failure time model (SNFTM) for the coun- 
terfactual time to the earlier of death or the diagnosis of a chronic illness and a 
conditional structural nested mean model (SNMM) for the counterfactual mean 
of a subject's counterfactual utility given his counterfactual time to death or a 
diagnosed chronic illness. Second our SNM only models the causal effect of an 
any increase in BMI between month m and m + 1 over a subject's maximum 
previous BMI. In particular, it does not model and thus is agnostic about the 
causal effect a) of any decrease in BMI or b) of any increase in BMI between 
m and m + 1 that fails to attain the previous maximum. As a consequence, our 
SNM is more robust than standard SNMs that also model a) and b), because 
our model makes fewer asumptions than such alternative models, and thus is 
less likely to be unspecified. However, the small number of assumptions made 
by our SNM are sufficient to consistently estimate our parameter of interest 
E [Y ] . 

In Sections 3.2.4-3.2.5, however, I show that (ii) and (iii) limit the ability to 
empirically test whether the joint structural nested model is misspecified. I also 
show that, somewhat remarkably, to adjust for bias due to reverse causation one 
need not assume a deterministic rank-preserving SNM. This is important since 
a deterministing rank-preserving SNM assumes that the effect of weight-gain 
on mortality is the same for different subjects, an assumption that is clearly 
biologically implausible. In Section 4, I consider how to account for censoring 
by administative end of follow-up. In Section 5, I consider the estimation of 
the expected utility under alternative dietary interventions. In Section 6, I 
discuss the consequences of measurement error in BMI. Proofs and statements 
of several new theorems are collected in Appendices land 2. Finally, estimation 
of the optimal "weight-gain" intervention is discussed in the Appendix 3. 
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2 Estimation of an overall effect 



2.1 The Data 

I first describe the observational data that is supposed to be available. First, 
I suppose that a subject's BMI is recorded at the end of each month t, t = 
0, 1, K, where time t is in months since age 18 and K+l = (2007 — 1950) x 12 
is the duration of follow-up. Let A*(t) be the difference between BMI at the end 
of month t and at the end of month t — 1. Let L (t) be the vector of covariates 
values recorded in month t and suppose L (t) precedes A* (t) temporally. L(t) 
includes blood pressure, HDL and LDL measures of cholesterol, any diagnoses 
of and clinical and laboratory characteristics of any chronic disease such as can- 
cer, CAD, diabetes, asthma, COPD, liver, renal disease, etc., level of exercise, 
measures of mobility and disability, etc. The vector L (t) also includes BMI (t) , 
the BMI just before t rounded to the nearest pound. Thus 

A*(t) = BMI(t+l)-BMI(t). (1) 

L (t) also includes the indicator I (T > t) of vital status at the beginning of 
month t with T the death time of a subject and, for any proposition B, I (B) is 
the indicator function that take the value 1 if B is true and zero otherwise Thus 
I (T > t) = 1 if a subject is alive at t and zero if dead at t. If / (T > t) = 0, I 
include in L (t) the exact day of death and, by convention, assign the value zero 
to all other components of L (t) . 

By convention, set A* (t) and the remaining components of L (t) to zero once 
a subject has died. 

The baseline covariates L (0) include covariate and BMI data on a subject 
before follow up starts at age 18. Specifically, let BMI (0) denote BMI at (just 
before) age 18 (i.e., time 0). Our inclusion of BMI just before age 18 as a 
covariate rather than a treatment reflects the fact that "change" in BMI since 
18 is our exposure. In particular, note that A* (0) is the difference between 
BMI recorded just before 18 yrs and 1 month and BMI recorded just before 
18 years. As is standard in the literature, I have taken change in BMI rather 
than in change in weight in pounds as the exposure variable. Let A* (t) and 
L (t) be change in BMI and covariate history through time t and A*=A* (K) 
be a subjects (change in) BMI history through month K and L=L (K + l) be 
L history through the end of the study. A subject's utility Y, a measure of 
quality-adjusted survival, is calculated from L=L (K + 1) since L includes the 
survival time of nonsurvivors, health status measures for survivors at end of 
follow-up, and time-varying health status factors. 
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2.2 Potential for Measured and Unmeasured Confound- 
ing: 

2.2.1 Reverse Causation and Unmeasured Confounding by Subclin- 
ical Disease: 

In the literature on the effect of BMI on mortality, a controversy has arisen as 
whether and how to modify standard analytic methods to account for "reverse 
causation". Reverse causation refers to the well-accepted fact that preclinical 
(i.e., undiagnosed) chronic disease, such as preclinical cancer, can cause both 
weight loss (or diminished weight gain) and death. It follows that among sub- 
jects with identical BMI history (A* (i — 1) , BMI (0)) and measured covariate 
history L (t) before age t, the subset whose monthly change A* (t) in BMI is 
negative are not comparable with regard to mortality risk to the subset with 
positive A* (t) , even if BMI has no causal effect on mortality. That is, reverse 
causation implies unmeasured confounding by undiagnosed chronic disease. In 
fact, by an analogous argument, even among the subset with A* (t) positive, 
there will be unmeasured confounding, because those with a small gain in BMI 
are more likely to have preclinical disease than those with a substantial gain. 

It follows that one requires an analytic method that can adjust for unmea- 
sured confounding due to the prescence of preclinical disease. I will present a 
method that is appropriate under the additional assumption that we are able to 
specify an upper bound on the length of time a subject may have a subclinical 
illness severe enough to affect his weight, before that illness becomes clinically 
manifest. 

2.2.2 Measured Confounders that are also Intermediate Variables 

I next turn to the issue of confounding by measured factors, i.e., by components 
of the covariate vector L (t) . For pedagogic purposes, in this subsection, it will 
be simpler to imagine that the unmeasured confounding due to reverse causa- 
tion discussed above is not present. Now it is fairly well accepted that obesity 
causes increased blood pressure (BP), increased low density lipoproteins (LDL), 
diabetes (Db), and decreased exercise and these four factors may in turn cause 
increased mortality. Thus these four variables are intermediate variables on the 
causal pathway from BMI to mortality. In order to prevent underestimation of 
the overall effect of BMI on mortality due to adjusting for intermediate vari- 
ables, many analyses of the effect of BMI on mortality have failed to adjust for 
BP, LDL , Db, or exercise in the analysis. However such a decison can only be 
justified if these potential intermediate variables do not also confound the BMI- 
mortality relationship. 

A sufficient conditon for these intermediate variables to also be confounders 
is that, among subjects with identical BMI history [BMI (0) , A* (t — 1)) until 
t, the subset whose monthly change A* (t) in BMI is negative are not comparable 
with regard to past BP, LDL, Db, and exercise history to the subset with positive 
A* (t) . Such non-comparability implies that, if data on time-varying BP, LDL, 
Db, and exercise history are not used in the analysis, their will exist a non-causal 
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association between an increase of A* (t) in BMI during month t and subsequent 
adverse mortality, even under the null hypothesis of no overall effect of BMI 
on mortality. Such non-comparability can occur whenever some or all of these 
intermediate variables are either a cause of a change in BMI or are correlated 
with an unmeasured cause of a change in BMI 

For example, it is likely that lack of exercise causes weight gain. In that 
case, if regular exercise causes decreased mortality, then, in an analysis that 
fails to adjust for exercise history prior to t, the association between an increase 
of A* (t) in BMI during month t and subsequent adverse mortality will be an 
overestimate of the true causal effect of A* (t) on mortality, due to uncontrolled 
confounding by exercise. 

Similarly, suppose that chronic emotional stress and low grade depression not 
only cause weight gain by inducing overeating as a soothing, self-medicating be- 
havior, but also directly cause elevated BP, elevated LDL, and Db independently 
via various stress-induced metabolic, immune, and sympathetic nervous system 
effects. If, as is true in most observational data bases, data on chronic emotional 
stress and low grade depression are not recorded (i.e., measured), then, even un- 
der the null hypothesis of no overall effect of BMI on mortality, the association 
between an increase A* (t) in BMI and subsequent adverse mortality will tend 
to be positive, whether or not one adjusts for elevated BP, elevated LDL, and 
Db in the analysis, due to uncontrolled confounding by chronic emotional stress 
and low grade depression. However, these variables should be adjusted for in the 
analysis, because the magnitude of positive overestimation will often be much 
less if they are adjusted for, because of their correlation with the unmeasured 
causal confoundcr - chronic emotional stress and depression. 

In contrast with the last paragraph, suppose there is no confounding by 
chronic emotional stress and low grade depression; rather, in the observational 
data base, most indivivduals who developed an elevated BP, elevated LDL, or 
Db became concerned about their health and instituted a diet that resulted 
in their gaining less weight than those without these conditions. Then the 
association found between an increase A* (t) in BMI and subsequent adverse 
mortality in an analysis that fails to adjust for these variables at t would tend 
to underestimate the true causal effect of BMI on mortality due to negative 
confounding. I conclude that elevated BP, elevated LDL, and Db could confound 
the association between increase in BMI and subsequent adverse mortality in 
either a negative or positive direction, depending on which of the mechanisms 
described in this paragraph and the last predominates. 

In summary, time-dependent covariates such as exercise (i.e., physical ac- 
tivity), BP, LDL, or Db that are recorded in L (t) may be both intermediate 
variables on the causal pathway from BMI to death and confoundcrs of the 
BMI-death relationship. It follows that one requires an analytic method that 
can appropriately adjust for the effects of measured time- varying covariates that 
are simultaneously intermediate variables and time-dependent confounders. 
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2.2.3 Intractable Unmeasured Confounding in Subgroups 

There may be subgroups denned by measured variables in whom confounding 
by unmeasured factors is intractable. For example, among persons with diag- 
nosed chronic renal, liver, pulmonary or cardiac disease, rapid weight gain can 
indicate increasing edema (water retention) due to unmeasured disease progres- 
sion rather than increasing fat stores; as a consequence, among chronic disease 
patients with identical pasts, comparability would not hold because individuals 
experiencing rapid weight gain may be at increased risk of death due to unmea- 
sured progression of disease compared to those with lesser weight gain. In such 
a case unmeasured confounding by disease progression may be intractable. 

Using other arguments, various investigators have argued that in both the 
subgroup of subjects over age 70 and the subgroup with BMI less than 21, 
subjects gaining weight at different rates are not comparable owing to unmea- 
sured confounding factors, even when data has been collected on many potential 
confoundcrs. . 

Therefore one needs an analytic method that can remain valid even when 
there exists intractable confounding among subjects with a diagnosed chronic 
disease, an age of greater than 70, or a BMI below 21. In the next section, I 
describe an analytic method that satisfies the requirements of this and the two 
previous subsections. 

2.3 A Simplified Description of G-estimation of Structural 
Nested Models (SNMs): 

In this subsection I give a nontechnical, conceptual description of how, even 
in the prescence of the measured and unmeasured confounding described in 
Section 2.2, g-estimation of structural nested models can be used to estimate 
the expected utility had, contrary to fact, all non-smoking 18 year old American 
men in 1950 been put on a stringent mandatory diet that guaranteed that no 
one would ever weigh more than their weight at age 18. In order to avoid 
technical digressions and thereby keep the description centered on important 
conceptual issues, this nontechnical description is neither complete nor fully 
accurate. Section 3 onwards provides a complete and accurate description. This 
completeness and accuracy unfortunately place greater technical demands on the 
reader. 

A locally rank preserving SNM for Y is a rule that takes as input a subject's 
observed utility Y, their observed BMI and covariate history through the end 
of the study, and an unknown parameter (3* and outputs the utility Yq that 
would have been observed if, possibly contrary to fact, the subject had followed 
the dietary intervention of the first paragraph of the Introduction. If the rule is 
correct and we knew the value of /?* , then we could calculate Yq for each study 
subject. The average of these Yq in the cohort of all non-smoking 18 year old 
American men in 1950 is our quantity of interest: the expected (i.e. average) 
utility had one implemented a dietary intervention that guaranteed that no one 
would ever weigh more than they did at age 18. However we do not know the 
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value of (3* . Thus the challenge is to estimate (3* from the data. When, as in 
section 2.2.2, all confounding is due to measured variables, Robins (12) proposed 
a method of estimation called g-estimation that is described next. 

If the only confounding is due to measured factors, then among subjects with 
the same BMI and covariate history prior to time t with nonnegative A (t) , the 
increase A{t) in BMI between t and t+1 will be conditionally uncorrelated with 
Yq. Thus to estimate /?*, we simply try many different guesses (3. If a particular 
guess (3 were the true (3*, then the output of the rule would be uncorrelated 
with A(t). Thus I choose as our estimate [3 of /?*, the guess (3 which results in 
an output that has smallest conditional correlation with A(t) when we combine 
the information across all months t from to end of follow-up at K + 1. 

When as in Section 2.2.3, there are certain identifiable subgroups in whom 
confounding is intractable, bias can result because the output of the rule will be 
conditionally correlated with A (t) even when (3 = (3* . To eliminate this bias, 
it suffices to search for lack of correlation with A (t) only among the subset of 
subjects who are not members of these intractable confounded subgroups at 
time t.That is, we simply restrict our g-estimation procedure at a given time t 
to subjects who are not currently members of these subgroups. 

When there is unmeasured confounding by subclinical disease such as in 
Section 2.2.1, I must modify our g-cstimation procedure. Suppose one can 
specify an upper bound, say 6 years, on the length of time a subject may have a 
subclinical illness severe enough to affect weight gain, before that illness becomes 
clinically manifest. Then one can still validly estimate (3* if one restricts the 
g-estimation procedure at a given time t to those subjects who would have 
remained alive and free of a diagnosed chronic (i.e., of clinical) disease for the 
six years following t had, possibly contrary to fact, they followed a diet that 
prevented any further weight gain over those 6 years; by our assumption of a 
6-year upper bound, such subjects did not have their weight gain affected by 
an undiagnosed chronic disease. [It does not suffice to restrict to subjects who 
actually remained alive and free of clinical disease for the six years following 
t, because if BMI change A (t) at t causally effects the onset of clinical disease 
and/or survival in the following six years, the variable 'survival without clinical 
disease for six years after t' is a response affected by the exposure A (t) and 
thus cannot be adjusted for without introducing selection bias as explained in 
Hernan et al. (13). Thus to validly estimate (3* using g-estimation, one must 
be able to determine those "subjects who would have remained alive and free of 
clinical disease for the six years following t had, possibly contrary to fact, they 
followed a diet that prevented any further weight gain over those 6 years." 

One can do so by specfying a second SNM, called a locally rank preserving 
structural nested failure time model (SNFTM), for the effect of change in BMI 
on the time X to the diagnosis of chronic disease or death (whichever comes 
first). A locally rank preserving SNFTM is a rule that takes as input a sub- 
ject's observed time X to (the earlier of) death or a diagnosed chronic disease, 
their observed BMI and covariate history through the end of the study, and 
an unknown parameter ip*, and a time t and outputs the time X t that would 
have been observed if, possibly contrary to fact, the subject had followed a di- 
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etary intervention in which no further weight was gained after time t. If ip* were 
known or well-estimated, we could compute X t for each subject, determine 
which subjects' X t failed to exceed t by more than 6 years, and exclude such 
subjects from the g-estimation procedure used to estimate [3* . 

Thus it only remain to estimate the parameter ip* of our locally rank preserv- 
ing SNFTM in the prescence of unmeasured confounding by subclinical disease. 
Now among subjects with the same BMI and covariate history prior to time t 
with nonncgativc A (t) who are not members of an identifiable subgroup with 
intractable confounding, the change A{t) in BMI between t and t + 1 will be 
uncorrelated with X t if we restrict to subjects with X t exceeding t by more 
than 6 years. Thus to estimate ip* , I simply try many different guesses ip. If a 
particular guess ip were the true ip*, the output of the SNFTM rule would be 
uncorrelated with A(t) when I restrict the g-estimation procedure to subjects 
whose output exceeds t by more than 6 years. Thus I choose as the estimate 
ip of ip*, the guess ip which results in an output that, under this restricted g- 
estimation procedure, has the smallest conditional correlation with A(t) when I 
combine the information across all times t from to end of follow-up at K + 1. 

Before proceeding to the more technical part of the paper, I provide a brief 
non-technical discussion of several important but subtle points about SNMs. 
First, locally rank preserving SNMs assume that the effect of a given increase 
in BMI on the utility Y and on X is the same for any two subjects with the 
same past measured covariate history. This assumption is biologically implau- 
sible since unmeasured genetic and enviromcntal factors will clearly modify the 
magnitude of the effect of weight gain on the responses Y and X. Fortunately, 
we prove in Section 3 that our g-cstimator of the mean of Y"o remains valid even 
if we allow the magnitude of the effect of weight gain on Y and X to be modified 
in an arbitrary manner by unmeasured genetic and enviornmental factors. 

The description of g-estimation of the parameters ip* of our SNFTM model 
for X assumed that the time X to death or diagnosed chronic disease was 
available for every study subject. However, by end of follow-up, a number of 
study subjects will remain alive and free of chronic disease. Such subjects are 
said to be censored. In Section 4, I show how our g-cstimation procedures can 
be modified to approprately account for these censored observations. 

The estimate of the mean of Y"o will be biased if either the SNMM for Y or 
the SNFTM for X are misspecified. I discuss below how to construct tests for 
misspecification. However, I also show that the power of such tests to detect 
model misspecification can be quite limited in the prescence of reverse causation 
by subclinical disease and intractable confounding in identifiable subgroups. In 
Section 3.3, I offer some suggestions on how the impact of this limited power 
on the quality of one's inferences can be lessened if one is willing to change the 
parameter that is being estimated. 
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3 Estimation of the effect of the "maintain base- 
line weight intervention" 

In this section I describe how we can use G-estimation of structural nested 
nodcls (SNMs) to estimate the the expected utility had one put all non-smoking 
18 year old American men on a stringent mandatory diet that guaranteed that 
no one would ever weigh more than their baseline weight established at age 18. 
For pedagogic reasons I first consider the simpler setting in which there is no 
unmeasured confounding by preclinical disease. 

3.1 Case 1: No unmeasured confounding by preclinical 
disease. 

3.1.1 A Locally Rank Preserving SNM. 

An SNM is a model for counterfactual variables Y m that denote a subject's 
utility measured at end of follow-up under the following counterfactual dietary 
intervention: 

Time m Dietary Intervention: The subject follows his observed diet up 
to month m following his 18th birthday and, from month m onwards, the subject 
is weighed every day: (i) whenever his weight is greater than or equal to his maxi- 
mum monthly BMI up to m [i.e., BMI max (m) = max{BM/(0) , BMI (m)}], 
the subject's caloric intake is restricted until the subject's BMI falls to below 
BMI max (m); (ii) whenever his weight is less than BMI max (m) , the subject is 
allowed to eat as he pleases without any intervention. 

A subject's responses had, possibly contrary to fact, he been made to follow 
a time m dietary intervention are referred to as counterfactual responses. We 
assume that Y m is well-define in the sense that its value is insensitive to the 
unspecified details of exactly how the subject's calories are to be restricted in 
(i). We also assume a subject's counterfactual responses are observed only for 
those m for which a subject's actual BMI history was consistent with his having 
followed the time m dietary intervention. For other values of m, the time m- 
specific counterfactuals remain unobserved. 

The time dietary intervention is the dietary intervention in the first para- 
graph of the Introduction. The counterfactual Yo is the utility corresponding 
to this regime. Thus the expected value E [Yq] of Yq is our parameter of interest: 
the expected utility had we placed in 1950 all non-smoking 18 year old American 
men on a diet that guaranteed that no one would ever weigh more than they 
did at age 18. 

Note that Yk+i=Y : if one were to follow his actual observed diet up to the 
time K + 1 at which the study ends, then no dietary intervention would have 
occurred. Hence the counterfactual Yx+i must be the observed (i.e., actual) Y. 

By definition, a subject's observed data through k (but before k + 1) is 
inconsistent or incompatible with following the " time m dietary intervention" if 
and only if BMI (k + 1) > BMI max (m) for some k > m. 
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Let us define A (t) to be the difference between a subject's observed BMI, 
BMI(t+ 1), just prior to month t + 1 and his maximum value BMI meLX (t) 
of BMI prior to month t, whenever that difference is nonnegative. When the 
difference is negative, we simply set A (t) to be zero. Formally then 

A (t) = BMI (t + 1) - SM/ max (i) if BMI (t + 1) > SM7 max (t) (2) 
A (t) = i/ BMI (t + 1) < BMI max (t) . (3) 

A (t) is nonnegative. ft follows that it is only when the individual's observed 
data is incompatible with the "time m dietary intervention" through time m is 
A (to) 7^ 0. If an individual's observed data is consistent with his having followed 
the " time m dietary intervention" , it is consistent with his having followed the 
"time t dietary intervention" for t > to. 

Note that Y m+1 - Y m = whenever A (to) = 0. If A (to) / 0, Y m+1 - 
Y m is the difference between (i) a subject's utility when he has his observed 
BMI (m + 1) history and thereafter, possibly contrary to fact, the subject 
follows the dietary intervention that guarantees his BMI(k) for k > to + 1 
never again exceeds BMI (to + 1) and (ii) his utility when he has his observed 
BMI (m) history and thereafter, possibly contrary to fact, the subject follows 
the dietary intervention that guarantees his BMI at to + 1 equals his observed 
BMI max (to) [rather than his observed BMI at to + 1] and that his BMI (k) 
for k > to + 1 never again exceeds BMI max (to) . As a kind of shorthand for the 
previous sentence, whenever A (to) ^ 0, we will refer to Y m+ \ — Y m as the causal 
effect of final blip of exposure of magnitude A (to) on the subject's utility. 

An additive locally rank preserving SNM is a deterministic model for the 
magnitude of the effect of a treatment A(m) on Y m+ i — Y m . Mathematically an 
additive locally rank preserving SNM assumes that for each time to = 0, K, 

Y m+1 - Y m = 7m [A(m), A (to - 1) ,Z(m) (4) 

where (i) f3* is the unknown true parameter vector, and (ii) 7 m [A(to), A (to — 1) , L (to) , 0] 
is a known function [such as {/?o + ftm + /3jl (w)} A (to)] satisfying the re- 
strictions 7 m [A(to), A (m — 1) , L (to) , 0\ = if A (to) = or (3 = 0. Here #2 is 
a column vector of length equal to that of the vector L (to). Furthermore, T as 
a superscript denotes the transpose of a matrix or vector. The first restriction 
must logically hold because, by definition, if A (to) = 0, Y m+ i = Y m . We now 
show that the second restriction guarantees that (3* = encodes the sharp null 
hypothesis that "following a diet that prevents one's BMI from ever exceeding 
the baseline BMI" has no effect on any subject's utility. 

Recalling that Yk+i — Y, the model (H]) is seen to be equivalent to the model 

K 

Y m = Y-J2 7m [Aim), A (to - 1) , I (to) , /?*] (5) 

1 n 

for m=0,l,...,K. To help understand equation (5) consider first the special case 
to = K. Then equation (5) says that to calculate Yjc from Y, we remove 
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the causal effect -f K [A(K), A (K - 1) , L (K) , j3*] of exposure A(K) at the last 
time K. Next consider the special case to = 0. Then equation (5) says that to 
calculate Y , one successively removes the effect of exposure at times K,K — 
1, 0. It follows from the restriction j m [A(m), A (to — 1) , L (to) , O] = that 
(3* = implies -y m [A(m),A(m - 1) ,L(m),0*] = for each to. Thus (3* = 
encodes the sharp null hypothesis that Yq = Y m = Y for all subjects and all to. In 
other words, one's utility at the end of the study will be the same regardless of 
whether or not one follows any " time to dietary intervention" . 

Since, by Eq. (5), a locally rank preserving SNM directly maps an individuals 
observed utility Y to the utility an individual would have under the "time m 
dietary intervention" , it is a model for individual causal effects. 

Possible choices of 7 m [a(m),a(m — 1) ,Z(m) ,0] include (i) (3a (to), 

(ii) (fa + (3\m) a (to), (iii) {(3q + (3im + /3j?(m)} a (to). In model (i), the 
effect of a change of A(m) in BMI is the same for all to. Under model (ii), the 
effect varies linearly with time to . Under model (iii), the causal effect of A (to) 
is modified by the most recent covariate history. 

In the following we assume the observed data O on each subject is O = 
(Y,L,A) = (Y,L(K+l) ,A(K)). That is O consists of a subject's utility Y 
and his covariate and treatment histories through the end of the study. The 
inclusion of A (K) is actually redundant, since the A— history A(K) is deter- 
mined by BMI (K + 1) , and BMI (K + 1) is a component of L(K + 1). Thus 
we could write the observed data as simply (Y, L (K + 1)) . However because 
we wish to use results on g-estimation of SNMs that were derived in previous 
papers in which A(K) was not determined by L (K + 1), we will continue to 
write O — (Y,L,A) and accept some redundancy in the notation. Let 

K 

Y m ((3) = Y-J2 7m [A(j),A(j - 1) ,L(j),(3] (6) 

j=m 

so, under our model, Y m = Y m (/?*) . Note that, for each (3, Y m ({3) can be com- 
puted from the observed data (Y, L, A). Suppose we had a consistent estimate 

(3 of (3*. Then Yq (^(3^j would be a consistent estimate of Yq = Yq (/?*) . Thus the 
average J27=i ^ai (^Pj /n over the n study subjects would be a consistent esti- 
mate of the parameter of interest E [Yq] . Further X)"=i (jty l n ~ S"=i 
would be a consistent estimate of the difference E [Yq] — E [Y] between the 
expected utility E [Yq] under a dietary intervention guaranteeing BMI never ex- 
cedes the baseline BMI and the expected utility E [Y] in the abscence of any 
dietary intervention. Below we show how one can obtain a consistent estimate 
(3 by g-estimation if a certain comparability assumption holds. 

The Innovative Aspect of our SNM: The most important and innova- 
tive aspect of our model is that it models the causal effect on the utility of an 
increase in BMI of A (to) over a subject's maximum past BMI, BMI max (m) . It 
does not model and thus is agnostic about the causal effect a) of any decrease 
in BMI or b) of any increase in BMI between to and m + 1 that fails to result 
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in one's BMI exceeding BMI max (to) . Our model (4) is thus more robust than 
alternative models that would also model a) or b). However, the small number 
of assumptions made by our model are sufficient for our purposes; if we can con- 
sistently estimate the parameter [3* , we can consistently estimate our parameter 
of interest E [Y ] . 

Thus it only remains to estimate (3*. In this section will do so under the 
following assumption, which will be weakened in later sections. Define the 
indicator variable S (m) taking values in the two-element set {0, 1} by 

S (to) = 1 & BMI (m + 1) > BMI max (to) (7) 

That is S (to) takes the value 1 if a subject's BMI just before to + 1 is at least 
as great as his maximum BMI up to time to. Otherwise S (to) takes the value 
0. 

Comparability Assumption (CO): Among subjects with the same A(m — 
1) history and covariate history L(m) (which includes BMI history BMI (to)) 
and with S (to) = 1, A(m) is statistically independent of the counterfactual 
Y m . Formally, conditional on (A(to — 1),L(to),S(to) = l) , A(m) is indepen- 
dent of Y m . [Since past A— history A(m— 1) is determined by BMI (to), A(m— 1) 
in the conditioning event (^4(m — l),L(m)) is redundant; nonetheless we shall 
retain the A(m — 1).] 

A comparability assumption such as CO is often referred to as an asssump- 
tion of no confounding by unmeasured factors or as an assumption of sequential 
randomization. 

Remark: To understand why we conditioned on 5 (to) = 1 in the CO 
assumption, imagine we had instead assumed that A (m) is independent of Y m 
conditional on (A(to — 1), L(to)) . That would have implied that among the sub- 
set of subjects with a given (A (to — 1),_L(to)) , the subgroup with A (to) ^ 
would have the same distribution of the utility Y m under the time to— dietary 
intervention as the subgroup with A (to) = 0. But, under the time to— inter- 
vention, all subjects in the A (to) ^ subgroup would have BMI(to + 1) equal 
to their common BMI max (to) € L{m), while many subjects with A (to) = 
(specifically, those with S (to) = 0) would have BMI(m + 1) < _BM/ max (to) . 
Thus, the A (m) = subgroup will have lower BMI at m + 1 than the A (to) ^ 
subgroup under the time to— intervention. Suppose the null hypothesis of no bi- 
ological BMI effect is false. Then, for an individual with A (to) = 0, their utility 
Y m should depend on their BMI at to+1. As such, it extremely unlikely that the 
A (to) = and A (to) ^ subgroups would be comparable. In contrast, if, as in 
assumption CO, we restrict the A (to) = subgroup to a subset of the subjects 
with 5 (to) = 1, then given L(m), this restricted A (to) = subgroup, like the 
A (to) 7^ subgroup, will have BMI(m + 1) equal to the common BMI max (to) 
under the intervention, so the assumption of noncomparability is plausible. 

It is interesting to note that if we had used the coding convention that vector 
L (k) includes S (k) as a component, we could then have stated our comparabil- 
ity assumption as A (to) is independent of Y m conditional on (A{m — l),L(m)) 
because, under this coding, pr (A (k) = 0\A (k — 1) = (k — 1) , L (k)) is one 



14 



whenever 3 (k) takes the value zero. However, we will not use this coding 
convention. 

Under the CO assumption, we can obtain a consistent estimator of (3* by 
g-estimation as follows. We specify a linear regression model 

E [A(m) \L(m),A(m- l),S(m) = l] = a T W{m) for to = 0, K. Here 
W{m) — w m |X(m), A(m — 1)] is a vector of covariates calculated from a sub- 
ject's past data, a T is a row vector of unknown parameters, and each person- 
month is treated as an independent observation, so each person contributes up 
to K + 1 observations. However, person months for which 3 (to) ^ 1 are ex- 
cluded from the regression. Examples of W(m) = w m [L(to), A(m — 1)] would 
be the transpose of the row vector (to, L T (m),L T (m — 1)). Let a be the OLS 
estimator of a computed using a standard statistical package. 

For the moment assume (3 is one dimensional. Let f3i ow and (3 up be much 
smaller and larger, respectively, than any substantively plausible value of (3* . 

Then, separately, for each (3 on a grid from (3i ow to (3 up , say (3i ow ,(3i ow + 
0.1,pi ow + 0.2,..., (3 U p, perform the score test of the hypothesis 9 = in the 
extended linear model 

E [A (to) \L(m),A(m - l),Y m (f3) ,3 (m) = l] = a T W(m) + 9Y m (/?) (8) 

that adds the covariate Y" m {(3) at each time m to the above (pooled over persons 
and time) linear model. A 95% confidence interval for (3* is the set of (3 for which 
ana = 0.05 two-sided score test of the hypothesis 9 = does not reject. The 
g-estimate (3 of (3* is the value of (3 for which the score test takes the value zero 
(i.e., the p- value is one). 

The validity of g-estimation is proved as follows. By our comparability 
assumption Y m (/?*) and A(m) are conditionally independent given 

(L(m), A(m — 1), 3 (to) = l). That is, Y m {(3*) is not a predictor of A(m) 
given (L(m), A(m — 1), 3 (to) = l) , which implies that the coefficient 9 oiY m ((3) must 
be zero in the extended model when f3 = (3* , provided the model 

E [A (to) \L(m), A(m — 1), 5 (to) = l] = a T W(m) is correctly specified. 

Now, we do not know the true value of (3. Therefore, any value (3 for which 
the data are consistent with the parameter 9 of the term 9Y m ((3) being zero 
might be the true (3* , and thus belongs in our confidence interval. If consistency 
with the data is defined at the 0.05 level, then our confidence interval will have 
coverage of 95%. Furthermore, the g-estimate (3 of (3* is that (3 for which adding 
the term 9Y m ((3) does not help to predict A (to) whatsoever, which is the (3 for 
which the score test of 9 = is precisely zero. The g-estimate (3 is also the value 
of (3 for which the OLS estimator of 9 is precisely zero. 

It may appear peculiar that a function Y m {(3) of the response Y measured 
at end of follow-up is being used to predict A (to) at earlier times. However, 
this peculiarity evaporates when one recalls that, for each (3 on our grid, we 
are testing the null hypothesis that (3 = (3* , and, under this null, Y m (f3) is the 
counterfactual Y m , which we can view as already existing at time to (although 
we cannot observe its value until time K + 1 and then only if A(t) in the observed 
data is zero from to onwards). 
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Suppose next that the parameter (3 is a vector. To be concrete suppose we 
consider the model with 

7m [a (to) ,a(m - 1),I (m) , 0\ = a (to) {/3 + ftm + /3jZ (m)} so j3 is of di- 
mension dim(/ (to)) + 2 where dim(l (to)) is the dimension of I (to) which, for 
concreteness, we take to be 3. Hence (3 is 5 dimensional. Then we would 
use a 5 dimensional grid, one dimension for each component of (3. So if we 
had 20 grid points for each component, we would have 20 5 different values of 
(3 on our 5 dimensional grid. Now to estimate 5 parameters one requires 5 
additional covariates. Specifically, let Q m {(3) — q m [L(m), A(m — 1), Y m (/?)] 
be a 5 dimensional vector of functions of (Z(to), j 4(to— l),Y m (/?)), such as 
Qm (P) — [l, to, L T (to)] (Y m (/3)) 2 . We use the extended model 

E [A (to) \L(m), A(m - l),Y m {(3) , H (to) = l] = a T W(m) + T Q m {(3) . 

Our g-estimate (3 is the (3 for which the 5 degree of freedom score test that 
all 5 components of 9 equal zero is precisely zero. The particular choice of the 
functions q m does not affect the consistency of the point estimate, but it affects 
the width of the confidence interval. 

When -f m [a (to) , a (to — 1),7(to) , 0\ = a (to) 1 R m is linear in (3 with R m = 
r m (L(m), A(m — 1)) being a vector of known functions and we choose Q m (/3) — 
Q m Y m {(3) linear in Y m ([5), then, given the OLS estimator a T of a T in the 
model E [A (to) \L(m), A(m — 1)] = a T W(m)), there is an explicit closed form 
expression for (3 given by 

-K ^ ( i—n^m—K 

0={ J2 ^i(m)A i (m)G im (a)Q* m Sf m \ I £ ( m )Y t G m (a) Q* 

i— l,m— J I i— l,m— 

(9) 

Identification : Suppose that two different values of (3, say (3 and f3, both 
make the 5 degree of freedom score test precisely zero and yet the two CI for 

P* centered at j3 and (3 do not overlap. How should we choose between the 
estimates? [In such a case, the matrix whose inverse is required in (9) will not 
be invertible and so (9) will fail.] Since we can use any 5 vector Q m (/3) = 
q m [L(m),A(m— l) 7 Y m ((3)~\ in our procedure, one simple approach is to try 
other choices of Q m {(3) until we find a Q m ((3) for which our CI for (3* includes 

only one of the (3 and (3 , declare the one included to be our point estimate 
of (3* and ignore the excluded one. Will this approach always succeeed? In 
general this approach should succeed in rather quickly excluding all but one of 
the values of (3 that originally made the score test zero, provided that the model 
7 m [a(m) ,a(m — l),I(m) , (3~\ is correct, except when (3* is not identified. By 
definition f3* is not identified if there is a (3** different from the true parameter 
(3* such that, with an infinite sample size, /?**, like (3*, makes the 5 degree 
of freedom score test precisely zero for all choices of Q m {(3) . In our model, it 



with G„„ (a) = [A, (m) - a T W l (m)] , S im = J2l=i'j=m R \ 
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follows from Robins (3, 4) that under the positivity assumption that 

Pr [A(m) = 0\A(m-l),L(m),E(m) = l] + (10) 

for all subjects and m = 0, K, (3* is identified. In our context the positivity 
assumption is a very weak assumption, that is almost certainly true. Hence, for 
the remainder of the paper, we will silently assume that it holds. 
Remark: We considered a linear regression model for 

E [A (to) \L(m), A(m — 1), Y m (/3) , S (to) = l] in the above for expositonal 
simplicity. In practice since A (to) > 0, we might use a log linear model that 
specifies 

E [A (to) \L(m),A(m-l),Y m (f3),Z(m) = l] = cxp {a T W(m) + 9 T Q m (/?)} and 
fit by non-linear least squares. In that case, in the last display, Gi m (a) = 
[Ai (to) — exp {a T Wi(TO)}] . Alternatively we could replace the repsonse vari- 
able A (to) in the linear regression by In (A m + 0.1) where the 0.1 is added to 
insure the logarithm remains finite even when A m — 0. In that case, Gi m (a) = 
[In {Ai (to) + 0.1} - a T W l {m)} 

3.1.2 An additive structural nested mean model (SNMM) 

An additive locally rank preserving SNM (4) implies that if two subjects have the 
same observed data O = (Y, L, A) they will have the same value of Y under the 
"time dietary intervention" of the introduction. That is the model implies that 
for these subjects, the effect of the " time dietary intervention" will be identical. 
This assumption is clearly biologically implausible in view of between- subject 
heterogeneity in unmeasured genetic and environmental factors. To overcome 
this limitation, we consider an additive structural nested mean model (SNMM) 

E[Y m+1 -Y m \A(m),L(m)] = lm [A(m), A(m - 1) ,L(m) ,f3*] (11) 

that models the conditional mean of Y m+ \— Y m given (A (to) , L (to)) rather than 
the individual differences Y m+ \— Y m , and thus does not impose local rank preser- 
vation. In particular Y m no longer is equal to Y m (/?*) . However, Robins (4, 5) 
proved the additive SNMM implies (and, in fact is equivalent to ) the assump- 
tion that Y m and Y m (/?*) have the same mean given A (to) , L (to) , S (to) = 1. 
That is 

E [Y m \A(m) ,L (to)] = E [Y m (/T) \A(m) ,L(m)] (12) 

for each to. Further he proved that, under the CO assumption, g-estimation of 
(3* retains all the properties described above, even in the absence of local rank 
preservation, except now the function Q m {(3) must be chosen linear in Y (fi) , 
i.e., Q m ((3) = Q* m Y m (/?) as above. 

As a consequence, the definition of non-indentifiability must be modifed 
as follows: the parameter (3* of an SNMM j3* is not identified if there is a /?** 
different from the true parameter /?* such that, with an infinite sample size, /?**, 
like j3* , makes the 5 degree of freedom score test precisely zero for all choices of 
Qm (P) that are linear in Y m (/3) . A fuller discussion of rank preserving versus 
non-rankpreservng models occurs in Section 3.2.4. 
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Alternative Approaches: Under the CO assumption, it is shown in ref 
[12] that the E [Y m ] arc nonparamctrically identified for m = 0, K from data 
O = (Y,L,A) by the IPTW formula 



E [YI {A (to) = (m) } W (to)] , 
where A (to) = (A (to) , A (K)) and the IPTW weight 

K 

W(to) = 1/ JJ {pr(A(k) = 0\A(k- l),L(k))} 

k—m 



(13) 



H(fe) 



is the inverse of the conditional probability that a subject had his observed 
treatment A (to) = (to). That is, E [Y m ] is the weighted mean of the observed 
utility Y among subjects who observed data was consistent with following the 
time to dietary intervention with weights given by the inverse of the conditional 
probability of having data consistent with following the intervention. Thus 
one could, in principle, consider estimating E [Yo] nonparamctrically by the 
weighted average of Y among subjects whose weight never exceeded their base- 
line weight at age 18 with weights proportional to an estimate W (0) of W (0) . 



That is, by 



E 

{i;Ai(o)=[ 



:(0)*i 



.(0)} 



/ 



The problem with 



^ W t (0) 

_{i;A t (0)=0(0)} 

this approach is that only the utility Y of the rare person whose weight never 
exceeds his age 18 weight contributes to the analysis. In contrast by specifying 
a SNMM, data on the utility Y of every subject contributes to the estimate of 
E [Yo] . The price paid for the greater efficiency of a SNMM is the possibility of 
bias if the SNMM (11) is misspecified. 

However, under the CO, E [Y m \A (to) , ~L (to)] = E [Y m \A(m - 1) ,L(m)] 
and is nonparamctrically identified by the formula 

E [YI {A (m) = (to)} /W (to) \L (to) , A (to - 1)] . Thus E [Y m+1 - Y m \A (to) , L (to)] 
is nonparametrically identified for m=0,...,K. Hence, given a sufficiently large 
sample size, one could in principle construct misspecification tests of the model 
(11) that have power against all alternatives when the model is incorrect. In 
practice, the available sample size may greatly limit the power to detect model 
misspecification. 

IPTW estimation of marginal structural models and the parametric g-formula 
are alternative approaches to model-based estimation of E [Yq] that also use data 
on every subject's utility Y. See Appendix 2 for further discussion of the latter 
approach. 

Remark: The reader familiar with IPTW expects W (to) to be defined as 

k 

W(m) = 1/ n {pr(A{k) = 0\A(k - 1) = 0(k- 1) ,L(k))} rather than as 



K 



1/ n {pr (A (k) = 0| A (k - 1) = (jfe - 1) , L (k)) } 

k—m 



H(fe) 



. In fact, the two expres- 
sion would have been equal had we used the coding convention that the vector 

L (k) includes E (k) as a component because, under this coding, pr (A (k) — 0\A (k — 1) = (k — 1) , L (fc)) 
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is one whenever S (k) takes the value zero. However, we do not use this con- 
vention. 

3.2 Case 2: Unmeasured confounding by preclinical dis- 
ease. 

In this section we no longer assume A(m) is statistically independent of Y m given 
A (to — 1) , L(m), S (to) = 1. To describe our new comparability assumption we 
need to introduce some further notation. Let X = min (T, V) be the minimum 
of the time T to death and the time T> to the diagnosis of a chronic disease, 
such as cancer, severe emphysema, liver or renal disease, or any other chronic 
condition that would be severe enough to affect weight gain. At each time to, 
the indicator / [X < to) is a component of L (to) . Further if / (X < m) = 1, 
the exact time X is observed and included in L (to) . Thus X is observed if X 
is less than K+l. However X is censored (i.e. not observed) on subjects whose 
X exceeds K+l, the end of follow up time. For the present we shall avoid the 
additional complications that arise from censoring by assuming that X is less 
than K + 1 for all subjects, so that the data O = (Y, X, L, A) is observed on 
each subject. In Section 4, we relax this assumption and allow for censoring. 

Let X m = min(T m ,2? m ) be the counterfactual version of X = min (T, T>) 
had "the time m dietary intervention" been carried out. Then we make the 
following more realistic assumption. 

Realistic Comparability (RC) Assumption : A (to) is statistically in- 
dependent of (Y m , X m ) given E (to) = 1, L(m), A (to — 1) and U (to) = (to) , 
where U (to) = 1 if a subject has at to or had prior to to, an undiagnosed chronic 
disease that was sufficiently advanced to interfere with his normal weight tra- 
jectory. Otherwise U (to) = 0. We also define U (to) = 1 for subject's alive at to 
with X < m under the assumption that there was probably a subclinical period 
prior to the time X of clinical diagnosis in which weight gain may have been 
altered. Note that U (to) = implies U (to) = (U (0) , U (to)) = (to) is also 
zero. 

Remark: The RC assumption cannot be recast as (Y m , X m ) independent of 
A (to) given (L(to), A (m — 1) , U (to)) even had we used the coding convention 
that vector L (k) includes S (k) as a component, because, even under this cod- 
ing, pr (A(k) = 0\A(k- 1) = (k - 1) , L (k) , U (to) , (Y m , X m )) would be nei- 
ther zero nor one and could , under the RC assumption, depend on (Y m ,X m ) 
whenever U (to) ^ and S (fc) = 1. For this reason it would perhaps be more 
precise to refer to the RC as a selective comparability assumption as it only 
implies comparability for a selected subset of the population. 

We observe (Y, X, L,A) but U (to) = (U (0) , U (to)) is, of course, gen- 
erally unobserved when X > m. Thus U is an unmeasured confounder. The 
most crucial of several assumptions needed to allow consistent estimation of the 
parameter of interest E [Yq] in this setting is the following. 

Clinical Detection (CD) Assumption: Any subject who has U (to) = 1 
[ie a sufficently advanced undiagnosed chronic disease at ( or before) to] and 
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thereafter follows "the time m dietary intervention" will either have died or 
been diagnosed with clinical chronic disease by time m + £, where ( is assumed 
known. Formally 

U(m) = l ^X m <m + ( (14) 

or equivalently 

X m > m + C U(m) = 0(m) (15) 

Here =>■ is translated as 'implies'. A typical choice for £ might be 72 months. 
It is useful to choose £ to be the minimal time for which (15) holds as this 
increases both the effciency of g-estimators and the power of goodness of fit 
tests to detect misspecification of a structural nested model and decreases the 
likelihood that an SNM is nonidentificd. However if the chosen ( is less then 
the true minimum time for which (15) holds bias will result. As a consequence 
one should routinely include a table that shows how one's estimate of E [Yq] 
changes as ( is varied. 

The RC and CD assumptions require that one record in X the minimum 
time of clinical onset among the set of clinical conditions whose preclinical phase 
could affect BMI. The exact clinical conditions that belong in this subset is a 
substantive question, about which subject matter experts should be consulted. 

Remark: We will later consider the effect of replacing the counterfactual 
X m by the observed X in the CD assumption. 

3.2.1 Estimation under a Rank Preserving SNM for Y m \X m with X m 
known 

To consistently estimate E [Y ] under RC and CD we must replace our SNMM 
model with an additive SNMM model for Y m \X m that also conditions on and 
allows effect modification by the counterfactual X m . For pedagogic purposes 
in this subsection we return to locally rank preserving models. A locally rank 
preserving SNM for Y m \X m states that 

Y m+1 -Y m = lm [A(m),A(m - 1) ,L(m),X m ,f3*] (16) 

where (3* is an unknown parameter and j m \A(m), A (m — 1) , L (m) , X m , /?] is 
a known function that can now depend on X m that takes the value zero if either 
A{m) — or (3 — 0. [We emphasize that it is X m and not X m that occurs in 
the last display.]. This model is equivalent to assuming 

Y m = Y m {13*) (17) 

for each subject with Y m {(3) now redefined as 

K 

Y m (f3)=Y-Y / lm [A(j), ~A(j - 1) ,~L(j) , Xj , 0\ (18) 

j=m 

Now, of course the counterfactual variable X m is itself unobserved. How- 
ever for pedagogic purposes in this subsection we unrealisticly assume that in 
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addition to the observed data (Y, X, L, A) , data on the counterfactuals X m are 
available. 

Remark: We do not actually require a locally rank preserving SNM for 
Y m \X m . A locally rank preserving SNM for Y m \c(X m ) for a certain known 
function c (a;) could be used instead. This remark is explored further in the 
Appendix. 

Redefine Q m ((3) — q m [L(m),A(m — l),X m , Y m (/?)] to possibly be a func- 
tion of X m . Consider again the g-estimator (3 that is equal to the (3 for which 
the 5 degree of freedom score test of 9 = is precisely zero in the model 

E [A (to) \L(m),A(m - 1), X m , S (m) = 1, Y m (/?)] = a T W{m) + 9 T Q m (p) . 

(3 would be a CAN estimator of (3* under the CO assumption, but not under 
the RC assumption. Under RC, the independence needed to make 6 = when 
(3=(3* only holds when we also condition on U (to) . 

However, consider the estimator (3 obtained when, for each time to, we only 
fit the previous model to subjects for whom X m > m + (, excluding all subjects 
with X m < m + (. This exclusion can be expressed by saying that we now fit 
the model 

E [A (to) \L(m),A(m - l),X m , 5 (to) = 1, Y m {(3) , X m > to + (] = a T W(m)+9 T Q m ((3) . 

(19) 

Then the estimator (3 is the (3 for which the 5 degree of freedom score test of 
the hypothesis 9 = is precisely zero in this latter model. When X m > m + (, 
U (to) = (to) , by assumption CD. Hence we can rewrite the last display as 

E [A (to) \L(m) , A(m - 1) , X m , S (m) = 1, Y m (/?) , X m > m + £ U (m) = (m)] 
- a T W(m) + 6 T Q m ((3) (20) 

showing that we have succeeded in conditioning on U (to) = (to) , even though 
U (to) is unmeasured! It follows that, when the parameter (3* is identified, 
the estimator (3 is a consistent and asymptotically normal (CAN) estimator 
of (3* under the RC and CD assumptions, since these assumptions imply the 
coefficient 9 = if = (3*. However as discussed further below, under the RC 
and CD assumptions, the positivity assumption no longer suffices to guarantee 
identification. 

In summary, all that was required to produce a CAN estimator (3 of the 
parameter (3* of our locally rank preserving SNM (17) for Y m \X m under the RC 
and CD assumptions was to restrict the earlier g-estimation procedure at each 
time to to those subjects with X m > to + (. 

Thus if 7 m [A (to) , A(m — 1), L (m) , X m , (3~\ = A (to) (3 T R m is linear in (3 
with R m = r m (Z(m), A m ) being a vector of known functions that now can 
depend on X m , then, given the OLS estimator a T of a T in the model 
E [A (to) \L{m),A(m - 1), S (to) = l] - a T W(m) and Q m {(3) = Q* m Y m (0) lin- 
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car in Y m (/J), the CAN estimator f3 exists in closed form as 

{i—n.ni—K 
J2 I [Xim > m + C] (m) A, (m) G m (a) Q* m Sj m \ (21) 
i=l jm =0 

{i—n,m—K 
1 i x im > m + C] (m) FiG im (S) Q* 
i=l, TO =0 

with G ml (a) = [A, (m) - a T Wi(m)] , S im = Y^ifj=m 

From the above, it follows that if, in addition to the observed data (Y, X, L, A) , 
data on the counterfactuals X m are available for each m, the sample average 
Y^i {j^j l n is a CAN estimator of the parameter of interest E [Y ] under the 
RC and CD assumptions, provided (3* is identified and both our locally rank 
preserving SNM for Y m \X m and our model 

E [A(m) \L(m),A(m-l),E{m) = l] = a T W(m) (22) 

are correct. Of course data on X m are unavailable. However in the next sub- 
section we prove an analogue of this result holds without data on X m under a 
locally rank preserving SNFTM for the X m which allows us to replace X m by 

an estimate X m (^j , where tp estimates the parameter ip* of our SNFTM. 

Before preceding to the next subsection, several additional points need to be 
made. 

Can we replace X m by X : A natural question that arises is the following. 
Suppose we replaced X m by the observed X in the CD assumption, in our 
definition of Y m {(3) , and wherever else X m occurs in this subsection, with the 
exception of the RC assumption (as the RC assumption with X replacing X m 
would clearly be false if BMI is a cause of T and/or C and thus of X.). Do (3 
and Y (^f3^J /n remain CAN estimators of j3* and E [Y a ]7 This question is 

natural in the sense that it is not obvious that the CD assumption and RP SNM 
based on X m are more likely to be true than when based on X. So were the 
answer "yes" it would be simpler and more straightforward to use X in place 
of X m . In particular, since X, unlike X m , is observed, we would eliminate the 
need to replace X m with the estimator X m {^j , thereby greatly simplifying the 
analysis. 

Unfortunately (3 and thus /n do not remain consistent when we 

use X in place of X m . To see why consider the model 

E [A (m) \L(m),E (to) = 1, A(m - 1), X, Y m (/?) , X > m + (] = a T W(m)+6 T Q m Y m (/?) 

(23) 

which has replaced X m in Eq (19) with X. Clearly (3 will only be consistent 
forjhc parameter (3* of our locally RP SNM if 6 = when (3 = (3*. That 
is (3 will only be consistent if Y m = Y m ({3*) is independent of A (m) given 
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(L(m), A(m — 1), S (m) = 1, X, X > to + £) . Now by the CD assumption with 
X replacing X m , X > m + ( implies U (to) = (to) . Thus consistency of /3 
requires Y m (/?*) independent of ^4 (m) given 

(L(m), A(to — 1), S (m) = 1, X, X > to + £, U (to) = (to)) . However, we 
show in the next paragraph that this independence statement is not implied by 
the RC assumption and thus will generally be false, unless A (k) has no causal 
effect on X for k > m in which case X = X m for each subject and we are back 
to Eq. (20). 

When A (to) has a causal effect on X (whether directly or through A (k) , k > 
to) then X is a common effect of two causes A (to) and X m that are independent 
conditional on the event (L(m), A(m — f ), S (to) = 1, U (to) = (m) , Y m (/?*)) . 
Therefore, conditional on both the previous event and (X, X > to + () , A (to) 
and X m are dependent and thus so are A (m) and Y m ((3*), since X m and 
Y m (/3*) are highly correlated, as both are functions of T m . 

However, even when A (to) has a causal effect on X, a slight modification of 
the above estimation procedure can be used to obtain CAN estimators of (3* in 
the special case in which A (to) has a known minimal latent period \ for its 
effect on X of at least ( months. 

Definition of Minimal Latent Period (MLP) for effect on X: A(m) 
has a minimal latent period for its effect on X of x months if, for every subject 
and each time k > to, Xk > m + x X m > to + x and Xk = X m if X m < 
m, + x- In particular by taking k = K + 1, the last two statements become 
X > m + x X m > to + x and X = X m if X m < m + x- 

When a known minimal latent period x exceeds £, we can obtain CAN es- 
timators of P* by simply replacing X > m+£ by m+x>X > m+Q in model (23) 
since then the event (X, m + x > X > m + ()is the event {X nll to + x > X m > to 
and we are back in the setting of Eq. (19), except for the additional restriction, 
m + x > X mi which does not introduce bias. Thus the existence of a minimal 
latent period of length x greater than ( allows us to estimate (i* and E [Y ] 
without the need to specify a SNFTM for the X m . 

We now prove that under the RC and CD assumptions, a MLP of length x 
greater than £ implies that 

XUA(m) \L(m), H (to) = 1, A(m - 1), to + x > X > m + (. 

It follows that taking the RC and CD assumptions as given, we can test the 
hypothesis that a MLP of length % greater than £ exists by testing whether 
the last display is true. In fact a test of the hypothesis that the parameter ip* 
of a SNFTM serves as a test of the previous display. To prove our previous 
claim note that, by the MLP assumption, the event m + x>X>m + (is 
the event to + x > X m > to + £ which, by the CD assumption, is the event 
m+x > X m > to+C, U (m) = (to) . Thus the last display is under the MLP and 
CD assumption equivalent to the statement " X m is independent of A (to) given 
(L(m), S (m) = 1, A(m — 1), to + x > X m > to + (,U (to) = (to)) " which is 
true by the RC assumption. 

Most experts believe it to be substantively implausible that an increase in 
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BMI has a minimum latent period of more than 72 months, our default choice 
for (. In contrast, in occupational cohort studies of the effect of a chemical 
carcinogen on time to clinical cancer, minimum latent periods of up to 10 years 
are commonly assumed. 



3.2.2 Estimation of E [Y ] under a Rank Preserving SNFTM: 

As mentioned above, an analogue of the above results hold when data on X m are 
unavailable under under a locally rank preserving SNFTM for X m . The simplest 
locally rank preserving SNFTM specifies that 

X m = m+ ( exp (V>* A (t))dt if X > m (24) 

J m 

X m = X if X < to, (25) 

where ip* is an unknown parameter and A (t) is as defined previously when t is 
a whole number of months and A (t) = A{ \t\ ) when t is not a whole number 
where \ t\ is the largest integer less than or equal to t. Thus, by the definition 
of an intergral as the area under a curve, 

/ exp(rA(t))dt= Yl exp(4>*A(j)) + {X- [X }} exp A ([X })) . 

A locally rank preserving SNFTM directly maps an individual's observed failure 
time X to the failure time X m the individual would have under the "time m 
dietary intervention" . Thus it is a model for individual causal effects. If ip* — 0, 
exp (tp*A (t)) = 1 and thus X m = m+ J X dt = m+X — m = X for any to. Hence 
ip* = encodes the sharp null hypothesis that Xo = X for all subjects, i.e., the 
"time dietary intervention" has no effect on any subject's X = min (T, V) . It 
is useful to note that when tp* ^ 0, the SNFTM (24)-(25) implies that there is 
no minimal latent period for the effect of treatment on X. 

A general class (although not the most general class ) of locally RP SNFTMs 
that includes the above one parameter model assumes 

f x - - 

X m = m+ exp {lu (A(t) ,L(t) ,ip*)}dt if X > to (26) 

J m 

X m = X if X < to (27) 

where u (A (t) ,L(t), ip) = u> (A (t) ,A(t~) ,L(t) ,ip) is a known function 
satisfying lu (A(t) ,A(t-),L(t),tp) = if A (t) = or ip = and A(t~) is the 
A-history until just prior to time t. For example, we mig ht have w (A (t) ,A{t~),L(t), ip) 
A{t) {ipo + ipfL (t)} where L(t) = L (\t\) and i ( [£j ) is as defined earlier. 

We next turn to estimation of ip*. For the moment, suppose the CO assump- 
tion modified to have (Y m ,X m ) in place of Y m held and that (Y, X, L, A) was 
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observed. Then we could consistently estimate ip* by g-estimation. Specifically 
we define 



X m {ip) = m + / cxp {uj (A (t) , L (t) , ip) } dt if X > m (28) 

J m 

X m {ip) = X if X < m (29) 

so under our model, X m = X m (ip*) . Note that, for each ip, X m {ip) can be com- 
puted from the observed data. Suppose, for concreteness, ip* is 5 dimensional so 
we search over a 5 dimensional grid. We let Q*£ {ip) = [L(m), A (to — 1) , X m {ip)] 
be a 5 dimensional vector of functions of {L{m), A (m — 1) , X m {ip)) such as 
Qm W) = X m W) [l,m, i T (m)] . We use an extended linear model 

E [A (to) \L{m), A (to - 1) , H (m) = 1, X m (VO] = a T ^(m) + T g;t (0) 

Our g-estimate tp is the -0 for which the 5 degree of freedom score test that 
all 5 components of 9 equal zero is precisely zero. Since, by the modified CO 
assumption, 9 = if ip = ip*, the g-estimate -0 is CAN for ip*. The particular 
choice of the functions {ip) does not affect the consistency of the point 
estimate, but it determines the width of its confidence interval. Because X m {ip) 
is a nonlinear function of ip, there is not a closed form expression for ip. However 
the equation solved by ip is a smooth function of ip, so standard methods for 
solving nonlinear equations such as the Newton- Raphson algorithm can be used 
to compute ip. 

Next suppose the observed data is still (Y, X, L, A) , but the modified CO 
assumption does not hold. Rather, the CD and RC assumptions hold. Define 
the estimator ip as the ip for which the 5 degree of freedom score test of the 
hypothesis 9 = is precisely zero in the model 

E [A (m) \L{m),A(m - 1), S (m) = 1, X m {ip) , X m {ip) > m + (] = a T W {m)+9 T (VO . 

Note the set of subjects who do not contribute to the score test of 9 = (i.e sub- 
jects with X m {ip) < m + Q depends on ip. When X m = X m {ip*) > m + (, then 
U (m) = (to) , by assumption CD. Hence, at ip — ip* , our procedure conditions 
on U (to) = (to) . It follows that, provided ip* is identified, the estimator ip is 
a CAN estimator of ip* under the RC assumption, as that assumption implies 
the coefficient 9 = if ip — ip*. However, under the CD and RC assumptions, 
the positivity assumption does not guarantee identification. 

Now let (i (ip^ be defined like (3 except that everywhere X m (ip^J replaces 

X m , so that P (ip^J is a function of the data {Y, X,L,A) only. Next define 

K 

Y m {(3,iP)=Y-J2 lm [A{j), A{j-1),L {j) , X, {iP) , 0\ (30) 

j=m 
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Note, by both models (17) and (26-27) being locally rank preserving, Y m (/?*, ip*) 

Y m . Thus, when ip* and (3* are identified, the sample average Yq (j3 (ip^j ,ip^j /n 

is a CAN estimator of the parameter of interest E [Y ] under the RC and CD 
assumptions, provided both our locally rank preserving SNM (17) for Y m \X m , 
our locally rank preserving SNFTM (26-27) for X m , and our model (22) are all 
correctly specified. 



3.2.3 Estimation of E [Y ] under a SNMM and a SNFTM without 
Rank Preservation: 

As discused earlier, the assumption of local rank preservation is biologically im- 
plausible. Thus we will no longer assume that our locally rank preserving models 
(17) and (26-27) are true. As a consequence we can no longer assume that there 
exists some ((3*,ip*) such that the unobserved counterfactuals (X m ,Y m ) equal 
the observed (X m (ip) , Y m (pi, ip)) when (fi, ip) = (pJ*,ip*). However, suppose 
with (X m (ip) , Y m (pi, ip)) still defined by (28-29), and (30), we assume that , for 
each m, there exists some (p3*,ip*) such that 

Assumption (i): when ip — tp*,X m and X m (ip) have the same conditional 
distribution given (A (to) , L(m), A(m — 1)) and 

Assumption (ii) 

E [Y m \A (to) ,L(m), A(m - 1), X m = x] = E [Y m (/?*,</>*) \A (m) ,L(m),A(m - l),X m (ip*) = x] 

(31) 

In contrast with the assumption of local RP, there is no apriori biological 
reason to exclude the possibility that (i) and (ii) both hold. 

When assumptions (i) and (ii) hold for each to, we say the SNMM 

~/ m [A(m),A(m-l),L(m),x,f3] (32) 

for Y m \X m and the SNFTM (28) - (29)) for X m jointly hold with true pa- 
rameter (pi*,ip*). If the RC and CD assumptions, the model (22) and (i) and 

/n as defined previously 

Y ] respectively, provided 
(p3*,ip*) are identified and we choose Q m (p3) linear in Y m (fi). [ In contrast, 

(ip) need not be chosen linear in X m (ip) .]. In summary, ip,f3 (ip^ , and 

J2i Yo {j3 (ip^j , ip^j /n have the same statistical properties under our joint 
SNMM model for Y m \X m and SNFTM for X m when local rank preservation 
does not hold as when it does. 



(ii) all hold, then V, P (V') , and Y (j3 , V>) 
are CAN for ip*,pl* , and the parameter of interest E 



3.2.4 Are Remarkable Results due to Some Sleight of Hand 

The result summarized in the last sentence is striking for a number of reasons. 
Our comparability assumption, i.e. the RC assumption, only assumes no un- 
measured confounding conditional on U (to) . Yet neither the SNMM for Y m \X m 
nor SNFTM for X m is a model for causal effects conditional on the unmeasured 
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U (to). Thus, it is remarkable that these models can be used to estimate causal 
contrasts such as E [Y ] — E [Y] under the RC and CD assumptions. Further- 
more, even though X m > m + ( implies U (to) = (to) by the CD assumption, 
nonetheless, in the abscence of local rank preservation, X m (ip*) > to + £ does 
not imply U (to) = (to) . Hence when local rank preservation does not hold, 
even though we condition on X m (ip) > m + ( in computing our g-estimates 

ip, (3 (ip^j , we do not thereby restrict the analysis to a subset of subjects all of 

whom have the same value of U (to) ; thus one might guess confounding by the 

unmeasured U (to) has not been controlled and our estimates of ip,/3 (ip^J , and 

Y^i=i Yo,i {j3 (V>) ! i^j l n must be inconsistent. Remarkably, such is not the 
case. 

How did we pull off the seemingly remarkable 'magic' described in the pre- 
ceding paragraph? We shall investigate whether we used some subtle " sleight of 
hand" . We use a simple paradigmatic instance of our model that only involves 
a single time-independent exposure to guide our investigation. Specifically, we 
next provide an explicit proof that contains no " sleight of hand" of our results 
in the case of a time-independent exposure. The general case is treated in 
the appendix. The reader who is interested more in the methodology and less 
interested in foundational issues may feel free to skip ahead to section 4. 

Paradigmatic Instance of a Time-Independent Exposure: We sup- 
pose that K + 1 = 1 so time is the only time of exposure. Further we assume 
there are no covariates. In this setting the RC assumption becomes (Yo,Xo) 
independent of A (0) given the unmeasured confounder U (0) = 0. The CD 
assumption becomes X > implies U (0) = 0. Our SNFTM for X becomes 

Assumption (i): X (tp) = Xcxp(ipA (0)) and X have the same condi- 
tional distribution given A (0) at ip = ip*, 
while our SNMM for Y m \X m becomes 

Assumption (ii): E [Y \A (0) , X = x] = E [Y (/3*,V*) 1-4(0), X (ip*) = x] 
where Y (0, V>) = Y - 7 o [A(0),X {ip) , 0\ ■ 

Neither model makes any reference to U (0) and thus neither is a model 
for causal effects conditional on U (0) . Furthermore, although Xq > £ implies 
U (0) = by the CD assumption, nonetheless X {ip*) > ( docs not imply 
U (0) = 0. Now to prove our results. 

Proofs Of Our Results: Proof that ip is CAN for ip*: By assumption (i), 
pr [X (r) > t\A (0) , X (V*) > C] = Pr [X > t\A (0) , X > (} . But, by the CD 
and then the RC assumptions, pr [X > t\A (0) , X > (}=pr [X > t\A (0) , X > (, U (0) = 0] 
pr [X >t\X Q >(,U (0) = 0] . Hence pr [X (ip*) > t\A (0) , X (ip*) > C] is not 
a function of A (0) . We conclude that A (0) and X (ip*) are independent given 
X (ip*) > C Thus E [A (0) \X (ip*) , X (ip*) > C] - a + 6X (ip*) has coeffi- 
cient 9 = so, when ip* is identified, the ip for which the the score test of 6 = 
takes the value is CAN for ip* . 
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Proof that (3 (ipj is CAN for (3* : By Assumption (ii), 

E [Y (/?* , V*) I A (0) , X (V*) - x, X (V*) > C] = £ [Yo\A (0) , A = x, X > C] • 
But, by the CD and then the RC assumptions, 

E [Y \A (0) , X Q = x, X Q > (} = E [Y \A (0) , X Q = x, X > C, U (0) = 0] 
= E[Y \ X Q = x, X > C, U (0) = 0] is not a function of A (0) . 
Thus, = E [Y {A (0) — E[A (0) \X ty*) , X (V*) > C]}] • 

Hence = E [Y (/?*, V*) {A (0) - E [A (0) |A (ip*) > (}}} .As a consequence, 
the (3 (^j for which the the score test of 9 = takes the value in the model 

E A (0) \X (V) > C, Y a (j3, ft) =a + 6Y Q (j3, ^ is CAN for /?*, when j3* 
and -0* are identified. 

Proof that Y^=i Y o,i (jf) , ^) /" is CAN for E [Y ] : E [Y ] = 

JJe [Y \A (0) ,X = x] dF Xa (x\A ) dF (A ) 

= JJe[Y {13*, r) \A (0) , X (r) = x] dFx^.) (x\A ) dF (A ) = E [Y ((3*,^*)} by 

assumptions (i) and (ii). Hence , £" =1 Y ^ (j3 (fy , ^ /n is CAN for E [Y (0*,i>*)] = 

E [Y ] , when j3* and ip* are identified. 

This completes the promised proof of our results in the time-independent 
case. The proof in the appendix of the general time-dependent case is not much 
more difficult when one proceeds by induction. We conclude no sleight of hand 
occured in the proof. 



Do Correctly Specified SNMMs for Y m \X m and SNFTMs for X m Al- 
ways Exist? Perhaps the sleight of hand occurred right at the start, when 
we supposed that there exist (/3*,ip*) such that assumptions (i) and (ii) hold. 
We now prove that no such slight of hand is afoot. Specifically we prove that 
there always exist correctly specified SNMMs for Y m \X m and SNFTMs for X m . 
[This result does not, of course, imply that the particular SNMM and SNFTM 
we actually choose to analyze are correct.] We actually prove this result for an 
alternative, more intuitive, definition of a SNMM for Y m \X m and a SNFTM 
for X. m and then prove these alternative definitions are logically equivalent to 
assumptions (i) and (ii). This is done in this subsection for the special case of 
a time-independent exposure and in the Appendix for a general time-varying 
exposure. 

Consider again, for simplicity, our paradigmatic instance. Write A (0) as A. 
Suppose that Y ,Yq, X, Xq arc all non- negative continuous random variables with 
support on (0, oo) , satisfying the consistency assumption X = Xq and Y = Yq 
if A = 0. Let S(x\A) = pr(X> x\A) . Let S Q (x\A) = P r (X > x\A) . Let 
Sq 1 (x\A) be the inverse of So (x\A) wrt the x argument. Define the function 
xl (x, A) = Sq 1 [{S (x\A)} \ A] . Substituting for A, we find xj (x, 0) = x, so 

x\ {X, 0) = X wpl (33) 
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Define x\ = x\ (X, A) . Then X f = x f {X, 0) = X, when A = 0. It is well known 
that Xq = Xq {X, A) and Xq have the same conditional distribution given A. 



Define S{t\A,xl = 



pr 



Y > t\A, Xl = x} and S (t\A, X 



pr (Y > t\A, X = x) . Let S 1 (t\A, X Q = x) be the inverse of 5 (t\A, X Q = x) 
wrt the t argument. Let y a (t, x, A) = Sq 1 (js (t\A, = x^j } \ A, X^ = x^j 

and Y Q = y\ (Y, X, A) . Then Y Q \A,Xq = x and Y \A,X n = x have the same 
conditional distribution. It follows that ^Y" ,X^j \A and (Yq,Xq) \ A have the 
same joint conditional distribution . Thus, 



E 



Y \Xi 



E[Y Q \X = x,A] 



(34) 



Define 



Y\Xl=x,A 



7 (A,x)=E 
The last two displays imply that 



E 



Y \xl=x,A 



E 



Y-Y \Xl=x,A 



E 



Y 



7 



xl 



■ A 



E[Y \X = x,A] 



and 



7 {0,X) = wpl 



(35) 



(36) 



(37) 



since, by Yq = y (Y, X, A), Yq = Y when A = 0. 

Here are the alternative definitions of a SNFTM for X and a SNMM for 

Definition a: Let xo (t, a, tp) be known function montone increasing in t 
for each (a, if>) satisfying xq (t, a, if>) = 1 if a = or ip = O.We say xo (t, a, ip) 
is a correctly specified SNFTM for Xq if there exists ip* such that Xo (tp*) = 
x (X,A,ip*) equals Xq with probability one. 

Definition b: We say a known function 7 (a, x, (i) satisfying 7 (a, x, 0) = 
if a = or j3 = is a correctly specified SNMM for Y \X if, for some (3*, 
7 (A, X,0*) = 7 f (A, X) with probability 1. 

Define Y ((3*,ip*) = F — 7 (A, X {^) , P*) ■ 

It is obvious from definitions a and b that there always exist correctly spec- 
ified SNMMs for lo|Xo under Definition b and correctly specified SNFTMs for 
Xq under definition a since 7 (A, X) and 4 (X, A) are well defined functions 
of (F,Fq) satisfying j {0,X) = and Xq(X,0) = X with probability one, 
where F and Fq, respectively, denote the joint distribution of (Y, X, A) and 
of (Yq,Xo,A). Note 7 (A, X) and x (X, A) do not depend on the conditional 
joint distribution of {(Y, X) , (Y , X )} given A. This is as desired as this joint is 
not non-parametrically identified from data (Y, X, A) even when A is randomly 
assigned. 
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Thus it only remain to show the logical equivalence of the original and al- 
ternative defintions of a SNFTM for X and a SNMM for Y Q \X . 

The following Lemma shows that the alternative defintions of a SNFTM for 
Xq and a SNMM for Yol^o imply the previous definitions. 

Lemma: Suppose x (t,a,ip) is a correctly specified SNFTM for X n as 
defined in definition a. Then X (ip*) \A has the same distribution as -Xo|A 
Further assume that 7(a,x,/3) is a correctly specified SNMM for Yol^o as de- 
fined in definition b. Then E [Y (/3*,ip*) 1-4, X a (ip*) = x] = E [Y Q \X = x, A] . 

Proof: The first result follows immediately from Xq and X n having the same 
conditional distribution given A. The second result follows from 

E \y - 7 f (A, x) \X ] Q =x,A] = E [Y q \X q = x, A] . 

Finally, the following Lemma shows that the original definitions imply the 
alternative definitions. 

Lemma: Suppose xo (t,a,ip) is montone increasing in t for each (a,ip) 
satisfying xo (t, a, ip) = 1 if a = or ip = 0. Further suppose that Xo (ip*) \A has 
the same distribution as Xo\A wpl where Xo (ip) — xo (X, A, ip) . Then Xo (ip) 
is a correctly specified SNFTM for Xo under definition a. In addition, suppose 
that 7 (a, x, (3) is a function satisfying 7 (a, x, (3) = if a = or f3 = 0. Suppose 
E [Y - 7 (A, x, (3*) \X* = x, A = a] = E [Y \X =x,A = a] for all (a;, a) in a set 
of probability 1 under the law of (X 0} A) . Then, 7 (a, x, (3) is a correctly specified 
SNMM for F |^o under definition b. 

Proof: The proof of the first part follows from the well known result that 
Xl = x\ (X, A) is the only function h (X, A) of (X, A) satisfying h (X, A) \A has 
the same distribution as X \A wpl. The second part is proved by showing that 
7* (a, x) is the unique function h (a, x) satisfying E [Y — h (A, x) \X^> = x, A = a] = 
E [Y"o|^o = x, A = a] for all (x, a) in a set of probability 1 as in Refs (8,10). 

Are 7 + (a, x),x (x, a) , and E [Yo] nonparametrically identified from data 
(Y, X, A) under our assumptions? In this subsection, we finally uncover 
some slight of hand that provided us with such seemingly magical results. Al- 
though we restrict our discussion to the special case of a time-independent ex- 
posure, similiar results apply in the general case. Specifically, we will show that 
7 (a, x),x a (x, a) , and E [Yo] are not identified by the distribution of (Y, X, A) 
under the RC and CD assumptions. Previously, we saw that 7 (a, x),x (x, a) , 
and E [Y ] are identified and equal 7 (a, x, (3*) , x (x, a, ip*) , and E [Y — 7 (A, X, (3*)] , 
respectively when we assume a correctly specified SNFTM xo (x,a,ip) for Xo 
and a SNMM 7 (a, x, (3) for F |X whose true parameters ip* and (3* are iden- 
tified (by g-estimation). It follows that identification of 7 + (a, x), Xq (x, a) , and 
E [Yo] must result from the functional form restrictions encoded in our models 
Xo (x, a, ip) and 7 (a, x, (3) . It follows that if we make the restrictions imposed 
by our models less rigid by adding additional parameters, we can lose identifica- 
tion of 7 (a, x), x (x, a) , and E [Yo] . This loss of identification occurs when, in 
an infinite sample size, more than one combination of parameters, say the true 
parameters (ip*,(3*) and the false parameters (ip**,(3**) , both make the score 
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tests in our g-estimation procedures exactly zero for all choices of Q m {(3) linear 
in Y m {(3) and all choices of Q™ {(3). This loss of identification can be expressed 
by saying that the data (even were the sample size infinite) can not be used 
to determine whether the true causal quanties are 7 (a, x, (3*) , xq (x, a, ip*) , and 
E [Y - 7 (A, X, (3*)] versus 7 (a, x, /?**) , x Q (x, a, ip**) , and E [Y - 7 (A, X, (3**)] . 

In contrast, 7* (a, x), x (x, a) , and E [Y ] are identifed under the compara- 
bility assumption that (Yo,Xo) is independent of Aq , without any reliance on 
the functional form restrictions encoded in our models. However, in contrast 
with assumption RC, this comparability assumptions contradicts our substan- 
tive knowledge, as it implies no unmeasured confounding by undiagnosed chronic 
disease. 

The problem of lack of identification under the RC and CD assumptions 
has little to do with the question of local rank preservation. Suppose we have 
assumed a correctly specified SNFTM x (x, a, ip) for X and we do not assume 
RP. Suppose in truth RP holds. Nonetheless, a second investigator who as- 
sumes the RP version of the SNFTM model gains nothing thereby in regard to 
the estimation of x (x,a): the causal quantity x (x, a) is identified under the 
non-rank preserving SNFTM if and only it is identified under the RP SNFTM. 
However, a small amount could be gained by assuming rank preservation for a 
SNMM; rarely by assuming RP a non-identifiable SNMM can become identifi- 
able as one can then use non-linear functions Q m (f3) of Y m {(3) in g-estimation. 
But this advantage is not actually due to rank preservation. Rather it is due 
to the fact that an RP SNMM is actually a special case of a structural nested 
distribution model (SNDM) as defined in Refs (5) and (7). Our model SNMM 
model 7 (a, x, (3) for Y \X is a SNDM if Y - 7 (A, X, 13*) is independent (rather 
than just mean independent) of A given X. It is this independence (rather than 
rank preservation) that licences the use of non-linear functions Q m {(3) of Y m {(3) 
in g-estimation. 

Non Identifiability of 7 (a, x), x (x, a) , and E [Yo] Suppose we do not 
impose a SNFTM for X or a SNMM for Y Q \X . Then, it is clear that all we can 

conclude under assumptions RC and CD is that X = x (X, A) and A = A (0) 



t 



t t t 



E 



are independent given X Q > £ and E Y — 7 (A, x) \ A (0) , X = x, X > ( 

E Y — 7 (A, x) \X = x, X Q > ( . As a consequence, our parameter of interest 
Yo] is not identified. Specifically, under RC and CD, with p = pr (A = 0) 

E [Yo] = (38) 
E[Y\X >(,A = 0}{pr[X > (\A = 0}p+ {l - pr [Jft < ( \ A ± ]}} (1 - p) 

(39) 

+E[Y\X < (,A = 0]pr[X < (\A = 0}p (40) 



+E 



{Y - 7 f (A, Xt) } |Xt < C, A + 0] pr [X* < C| A + 0] (1 - p) . (41) 
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However the quantities 

pr [Xt < C| A ± 0] = pr [X < C| A ± 0] , (42) 

£[{y-7 f (A,xt)}|xt <c,a^o] = £[y |A <c,^o] (43) 

are not identified under the RC and CD assumptions. It suffices to show this 
when RP holds. So, for the moment assume RP. Because both quantities (42) 
and (43) refer to the distribution of the counterfactuals responses (Y , X ) under 
no exposure (no weight gain) among those who actually were exposed (A ^ 0) , 
we need an assumption to identify them under RP. But under RC, we only have 
comparability conditional on a value of U (0) , which is unknown when Xq < (, 
so identification fails. 

When we additionally assume a SNFTM for X and a SNMM for Y \X , 
we may or may not obtain identification of E [Yq] depending on whether the 
additional functional form restrictions encoded in the models suffice to identify 
the quantities (42) and (43) by allowing us to extrapolate from Xo > £ where 
we have comparability (since, by CD, U(0) = 0)) to X < ( where we do not. 
To clarify this last statement, consider the following RP SNM for lol-^o : *o = 
Y-i(A,X o ,0') with 

7 {A, X o ,0) = fa AI (Xo <() + 0i AI (X > Q . (44) 

Under assumptions RC and CD, even if we unrealistically suppose that data 
on X was available for all subjects, we could not identify 0* = {0o,0*) T , 
because 0q would not be identified, although 01 would be identified. This 
follows from the fact that, under RC and CD, no subject with Xo < C mav 
contribute to g-estimation of 0*. As a consequence we cannot identify E [Ya] 
because Y = Y — 0$ is not estimable on the subset of exposed subjects (A = 1) 
with Xo < (■ 

In contrast, were data on X available, 0* and E [Y ] are identified in the RP 
SNM 7 (A, X o ,0) = O A + 0i AXo because both 0% and 0{ can be estimated by 
g-estimation restricted to subects with X > C- Thus Y = Y — 0q A — 01AX O can 
be estimated for all subjects, including those with A = 1 and X < (, because, 
by having the same parameters apply to subects with X < ( as to subjects with 
Xq > C, the model allows extrapolation from subjects with Xo > ( to subjects 
with Xo < C O nc must weigh the benefit of extrapolation that comes with 
assuming model 7 (A, Xo,0) = 0oA + 0iAXo against the risk that the model 
is misspecified for subjects with Xo < (, as would be the case were the true 
model: 7 (A, X o ,0*) = 0*AI (X >() + 0{AX a I (X > () + 0*AI (X < () + 
01AX O I {Xo < C) w hh 02 very different from 0q and with 01 very different 
from 0*. Then the extrapolated value Y — 0qA — 0{AX o for F based on the 
misspecified model would be a badly biased estimate of the true Y for subjects 
with A = 1 and X < (■ Yet, because the model 7 (A, X , 0) = O A + 0iAX o 
is correct for subjects with Xq > (, there exists no valid test of model fit that 
could detect the biased extrapolation when we only assume RC and CD. 

Suppose now, as is true in practice, data on X are unavailable for subjects 
with A=l. Then, under assumptions, RC and CD, without the help of a correct 
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RP SNFTM for Xq whose functional form provides for extrapolation, we can 
no longer identify any aspect of the distribution of Yo for any identifiable subset 
of subjects with 4^0. This is because, although we know that the identified 
quantity E [Y\X > (, A = ] equals E [Y \X > (, A =/= ] , we cannot identify 
which subjects with A ^ have X > (■ 

In summary, in the realistic setting of longitudinal time -dependent expo- 
sures, the possibility of sensitivity of one's estimate of E [Yq] to model extrapo- 
lation should be examined by reestimating E [Yq] under a variety of models that 
differ in both the dimension of the parameter vectors and in functional form. 

A final point is that no individual who has developed a chronic disease by 
time to is included in our g-estimation procedure at to because X m (ip) = X < 
m + <j for such subjects. Thus our estimate of the effect of exposure at time to on 
a subject with a chronic disease at to is identified wholly by extrapolation from 
the effect on subjects without chronic disease at to. One approach to lessening 
the degree of extrapolation is to require a subject to be rather ill before they 
meet the definition of having a diagnosed chronic disease. For example, mild to 
moderate diabetes or hypertension need not qualify as having a chronic disease, 
especially if regular data on blood pressure and blood glucose have been recorded 
in the data base, as unmeasured confounding by undiagnosed mild to moderate 
diabetes or hypertension should then be minimal. If our definition of a diagnosed 
chronic disease is sufficiently stringent, then few subjects who meet the definition 
at m will be observed to gain weight subsequent to to. In that case, model-based 
extrapolation must be minimal - any model-based extrapolation is restricted to 
those gaining weight at to, because our models are models for the causal effect 
of weight gain (not loss) at to. In Section 3.3 we offer a different appproach to 
lessening our reliance on model misspecification. 

3.2.5 Can we replace X m by X Revisited: 

We revisit the issue of whether we could have replaced X m by the observed 
X in the CD assumption if we are willing to assume a SNFTM for X m so 
as to link the distribution of X with that of X m . We take the observed data 
to be (A (K) , L (K + I) ,Y, X) . We will study the implications of 2 different 
SNFTMs. The first SNFTM is the model discussed above that assumes X m (ip*) 
and X m have the same conditional distribution given (L(to), A(m)) . The sec- 
ond assumes X m (ip*) and X m have the same conditional distribution given 
(X(m), A(m),U (m) = 0). In both cases X m (ip) is defined by Eqs (28) - (29) . 
Note a locally RP SNFTM implies X m (ip*) = X m and thus both models are 
true. When rank preservation does not hold, the truth of one model does not 
imply the truth of the other. We first show that when rank preservation does 
not hold, under the RC assumption and the modified CD assumption in which 
X m is replaced by the observed X, the parameter ip* of the first SNFTM may 
not be identifiable; however, the parameter of the second model is estimable 
by g-estimation. Thus one might assume we might impose the modified CD 
assumption and the second model in lieu of the unmodified CD assumption and 
the first model. However we shall see this approach has a drawback: knowledge 
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of the parameter ip* of the second model in contrast to that of the first model 
does not help identify the parameter of interest E [Y n ] . 

We now show that ip* is identifiable in the second SNFTM model under RC 
assumption and the modified CD assumption. Note X > ( + to is equivalent to 
X m (VO - m+jx exp {uj (A (t) ,L(t),iP)}dt> m+ exp {w(A(t),L (t) , V) } dt. 
Thus, the modified CD assumption implies that whenever X m (ip) > to + 
J^ +m exp {lu (A (t) ,L(t),ip)} dt, we have U (to) = 0. However, even if we made 
the the rank preservation assumption that X m (ip*) = X m , we cannot therefore 
conclude from the RC assumption that A (m) is independent of X m (ip*) given 

(L(m),A(m), X m (ip*) > to + Xl +m exp {u (A (t) ,~L(t),ip)} dt) ; although this 

conditioning event indeed implies U (to) = 0, nonetheless, the conditioning event 
also depends on A (t) for t > m, while the conditioning events in the RC as- 
sumption do not. 

However, if we let d (to, ip, () be the maximum value of X m (ip) among all 
subjects with m < X < ( + to (i.e., subjects with to < X m (ip) < in + 
j^ +m exp {uj (A (t) ,L(t),ip)} dt), then X m (ip) > d (to, ip, Q implies X > ( + m 
and thus U (m) = 0. Thus, we can conclude from the RC assumption that, 
under a rank preserving model, A (to) and X m (ip*) are independent given 
(L(m),A(m),X m (ip*) > d(m,tp* ,()) , since d(m,ip,() does not vary among 
the subjects. (Technically, this independence only holds if we replace d (to, ip, () 
by its probability limit. But this distinction is unimportant for inference be- 
cause d (to, ip, C) converges to its probability limit at a rate even faster than n 1 / 2 
under mild regularity conditions.) Thus, given a rank preserving SNFTM, we 
can use g-estimation to obtain a CAN estimate ip of ip* under the RC and mod- 
ified CD assumption. Specifically, ip is the ip for which the 5 degree of freedom 
score test of the hypothesis 9 = is precisely zero in the model 

E [A (to) \L(m),A(m — 1), E (m) = 1, X m (ip) , X m (iP) > d (to, V, 0] 
= a T W(m) + 8 T Q;: (iP) . 

Suppose now rank preservation is absent. If we assume the second SNFTM, we 
know X. m (ip*) and X m have the same distribution given L(m), A(m), U (m) = 
0. Thus, by the RC assumption A(m) and X m (ip*) are independent given 
(L(m),A(m),X m (ip*)>d(m,ip*,()),U(m) = 0. Hence A (m) and X m (ip*) 
are independent given (L(m), A(m), X m (ip*) > d(m,ip* ,£)) since the event 

L(m),A(m),X m (ip*) > d (to, ip* X) is equivalent to the event L(m), A(m),X m (ip*) > 
d (to, ip* ,Q) ,U (to) =0. So ip generally remains CAN for ip* . 

We next show that ip* is not identifiable in the first SNFTM model under 
RC assumption and the modified CD assumption. Under the first SNFTM, 
we only know X m (ip*) and X m have the same distribution given L(m),A(m). 
Thus X m (ip*) \L(m), A(m), X m (ip*) > d (to, ip*, Q has the same distribution as 
X m \L(m),A(m), X m > d(m, ip*,() . 

Thus, by equivalence of the conditioning events, both X m (ip*) \L(m),A(m),X m (ip*) > 
d(m,ip*,() and X m (ip* ) \L(m) , A(m) , X m (ip*) > d (m,ip* , Q ,U (m) = have 
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the same distribution as X m \L(m), A(m), X m > d (m, ip* , £) ■ However, under 
the first SNFTM and without rank preservation, this equality does not allow us 
to invoke the RC assumption, since the the conditioning event L(m), A{m), X m > 
d (to, ip* , C) , U (to) = in that assumption differs from the conditioning event 
L(m),A(m),X m > d(m,ip*,() ■ Thus we cannot conclude A(m) and X m (ip*) 
are independent given (L(m), A(m), X m (ip*) > d(m,ip* ,£)) and so ip will not 
be CAN for ip* under the first SNFTM. Indeed indcntification is not possible. 

Finally we argue that knowledge of the parameter ip* of the second model 
in contrast to that of the first model does not help identify E [Y ]. Under the 
second model, we only learn the causal effect of treatment at time to among 
those with U(m) = 0. This does not allow us to estimate the distributions of X m 
and thus Y m for all subjects. In fact, the counterfactual distribution of X m and 
thus Y m are not even identified in those with U (to) = for to < K, because the 
distributions of Xk and thus Yk are not identifiable in those with U (to) = 
but U (K) ^ 0. One way to understand the difference is that the second model 
does not allow for the extensive model-based extrapolation that the first model 
does. Whether that is viewed as a drawback of the second model clearly depends 
on one's faith in versus skepticism about model-based extrapolation. 

3.3 Intractable Confounding In Subgroups: 

Our comparability assumption RC that A (m) is statistically independent of 
(Y m ,X m ) given both L(m) and [/(to) = (to) at time m may not be reason- 
able for particular, identifiable subgroups of the study population. That is, there 
may be identifable subgroups in whom confounding by unmeasured factors is 
intractable, where, by defintion, a subgroup is identifable at time to if member- 
ship in the subgroup is determined by the measured variables L (m). In Section 
2.2.3, we noted that possible examples of such subgroups include subjects with 
a diagnosed chronic disease, an age of greater than 70, or a BMI below 21. In 
fact, since we have assumed U (to) = 1 whenever X < to, we have all along 
been assuming intractable confounding in the identifiable subgroup consisting 
of those alive with a diagnosed chronic disease at to (X < to, T > to) . We have 
therefore been excluding them from our g-cstimation procedure by requiring 
X m > to + q for inclusion. Recall that if X < to, then X = X m . 

Suppose therefore we wish to conduct an analysis where no comparability 
assumption (neither CO nor RC) is assumed at time to for subjects who, at to, 
have an age of greater than 70, or a BMI below 21. To do so, as described in Rcf. 
(16), we simply redefine 5 (to) to be zero for such subjects regardless of whether 
or not their BMI (to + 1) > BMI max (to), so that they too are excluded from 
contributing to g-estimation at time to. In so doing, we do not change the mod- 
els being fit, the interventions under consideration, or the parameter of interest 
E [Yo] . Rather we only change, by decreasing, the number of person-time ob- 
servations used to estimate our model parameters. We thereby sacrifice some 
power and efficiency. As a consequence, even were willing to make assumption 
CO for the remaining subjects with S (to) = 1, E [Yq] would no longer be non- 
parametrically identified, because model-based extrapolation is now being used 



35 



for identification. 

In contrast to g-estimation of SNMs, when confounding by unmeasured fac- 
tors is present in certain subgroups of the study population, neither IPTW esti- 
mation nor the parametric g-formula estimator can be used to estimate E [Yq] . 

If a substantial fraction of the total person time is accrued by subjects in 
identifiable subgroups with intractable confounding then either identification 
will fail or, more often, the validity of one's estimate of E [Yq] will rely heavily 
on model extrapolation. One, albeit not altogether satisfactory, way to decrease 
the reliance on model extrapolation is to give up the attempt to estimate the 
parameter of interest E [Yq] . Instead, let IN (m) be the indicator of intractable 
confounding in identifiable subgroups that takes the value 1 if at time m a sub- 
ject is in an identifiable subgroup with intractable confounding and otherwise. 
Note that, based on the above discussion, subjects alive at to with X < m have 
IN (to) = 1. 

Define to be one's counterfactual outcome when following the time 
m T dietary intervention in which a subjects follows his observed diet up through 
month to and is thereafter weighed daily. On any day in month k > to that 
his weight exceeds his previous maximum monthly weight, the subject's caloric 
intake is restricted whenever IN (k) = 0. However, during months in which a 
subject is in an intractable subgroup [IN (k) = 1], we place no restrictions on 
his diet or weight gain, reflecting the fact that due to intractible confounding, 
we are unable to estimate the effect of preventing weight gain among subjects 
with IN (to) = 1, except by model extrapolation. 

Our new goal becomes to estimate E [Yq] , the mean utility under an inter- 
vention in which, starting at age 18, each time to a subject with IN (to) = 
exceeds his past maximum past BMI, we calorie restrict him to prevent further 
weight gain. To estimate E [V T ] by g-estimation we proceed exactly as above 
except (i) we define new variables A 1 (to) and S T (to) that equal A (to) and 
H (to) whenever IN (to) = but are zero whenever IN (to) = 1 , and (ii) every- 
where replace A (m) and S (to) in our g-estimation procedure by A 1 (to) and 
S T (to) . Then, our algorithm that had estimated E [Yq] will now output an esti- 
mator of E [Y q ] . In summary, at the cost of estimating a parameter E [Yq ] of 
lesser interest than E [Y ] , we have eliminated the model extrapolation required 
to estimate the effect of weight gain among subjects with IN (to) = 1. 

However, the procedure in the preceding paragraph has not eliminated the 
model extrapolation required to estimate the effect of weight gain among the in- 
tractably confounded nonidentifiable subgroup defined by m < X m < to + <^. As 
a consequence E [Y Q T ] , like E [Yq] , fails to be nonparametrically identified and 
must rely on model extrapolation for identification. Specifically the subgroup 
with to < X m < to + <; is intractably confounded by U (m) . It is not identifiable 
because the observed data cannot determine membership. For example, among 
subjects with A T (to) > 0, we cannot determine if a subject with X observed to 
be between to and to + <; is a subject with to < X m < to + <; versus a subject 
with X m > to + with X occurring before to + <; owing to the causal effect of 
his weight gain A 1 (to) . As a consequence it is not possible to assign all mem- 
bers of the intractably confounded subgroup with m < X m < to + <; the value 
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IN (to) = 1 , while assigning all members of the unconfoundcd subgroup with 
m + q < X m the value IN (to) = 0. The latter subgroup is unconfounded under 
the RC assumption because to + c < X m implies U (m) = (to) by the CD 
assumption. 

In fact, a minimal latent period with length x > c; is required for nonpara- 
metric identification of E [Yq] . For the remainder of this subsection, assume 
such a MLP. Then subjects with m < X m < m + q form an identifiable sub- 
group, asm<X<TO + <r and to < X rn < m + c; are equivalent. Similiarly 
subjects with X m > m + q now form an identifiable subgroup. Thus we can now 
assign IN (to) = 1 to all subjects in the confounded subgroup to < X m < to + c; 
and IN (to) = to all members of the subgroup X m > to + cr who were not 
already known to have IN (to) = 1 by virtue of membership in some other 
intractably confounded subgroup (eg age greater than 70.) Once we have as- 
signed all members of the subgroup to < X m < to + t the value IN (to) = 1, our 
time to t dietary interventions no longer restrict the diet of any subject of any 
intractably confounded subgroup. As a consequence E [Yq~\ is now nonpara- 
metrically identified. A formal proof is given in the appendix where it is also 
shown that, owing to the nonparametric identification, E [Y~] can be estimated 
using the parametric g-formula estimator and the IPTW estimator, as well as 
by g-estimation of structural nested models. 



4 Censoring: 

We now consider the realistic setting in which the available data are O = 
A (K) , L (K + 1) , Y, XI (X < K + 1) indicating that X is not observed in sub- 
jects for whom X exceeds the end of follow up time K + 1. For such censored 
subjects, X m (ip) is not observed. As a consequence g-estimation as described 
above cannot be done. We will describe a modified estimation procedure that 
can be validly applied to censored data. In the interest of brevity, we only con- 
sider a procedure that is easy to describe. The down side is that the procedure 
we describe is not as efficient as other more complex procedures. 

Given a SNFTM for Ao we can still use g-estimation to obtain CAN esti- 
mates ip of ip* from censored data by everywhere replacing X m (ip) by C m (ip) = 
min (X m (ip) , K m (ip)) , in the g-estimation procedure, where 

K m {tp)=m+ min If exp { w CA\ {t) , L; (t) ,i>)}dt\ (45) 

{i;X t >K+l}yJ m >> j 

is the smallest possible value of X m {ip) any censored subject could possibly have 

(as to+I exp {u (A (t) ,L(t),ip)} dt^ would be X m (ip) for a given censored 

subject had he died, unbeknownst to us, immediately after end of follow up.). 
Note C m (ip) > to + ( implies X m (ip) > to + ( so our g-estimation procedures 
remain restricted to subjects with U (to) = 0. 

Similiarly, given a SNMM model we can still use g-estimation to obtain CAN 

estimates (3 (ip) of ip* from censored data by replacing X m {ip) by C m (ip) , 
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everywhere in the g-estimation procedure. However there is a subtlety in in- 
terpretation. Specifically define the function c m (x) = min (a;, K m (ip*)) , so 
c m (X m (ip*)) = C m (ip) . Define C m = c m (X m ) . The correct definition of our 
SNMM model is 

E[Y m \A(m),L(m),A(m-l),C m =x] (46) 
= E[Y m (f3*,r)\A(m),L(m),A(m-l),C m (ij)=x] (47) 

where, now, 

K 

Y m {(3,i,)=Y-Y J 1m [A(j),A(j - 1) ,L(j) , Cj y,),0\ . (48) 



We refer to this model as a SNMM model for Y m \C m . Technical details are 
given in the appendix. Finally a CAN estimator of E [Yq] from censored data 

is J2i=i Yo,i (i^j /n as before with f3 (ip^ and ip as redefined in this 
section. 



5 Maximum Weight Gain Dietary Intervention 
Regimes 

We use g m to denote a general maximum weight gain dietary intervention regime 
beginning at time to. Mathematically g m is a collection of functions g m = 
{<7fc [a(k — 1), f(fc)] ; k = m, K}. Under a regime g m a subject follows his own 
observed diet history prior to m and then, for K > k > m, gt [a(k — l),l(k)] is a 
non-negative function that specifies the increase in maximum BMI to be allowed 
at time k for a subject with past exposure and covariate history [a(k — 1), l(k)\ . 
See the definition in the following paragraph for a precise statement. We use 
g as shorthand for a regime go beginning at time 0. Note that any regime 
.9 = 9o = {fffc \a{k — 1), l(k)] ; k = 0, if} is naturally associated with a par- 
ticular regime g m : the regime g m — {gk [a(k — 1), f(fc)] ; k = m,...,K} where 
one follows his oberved diet up till time m and then follows regime g m using 
functions gk [a(k — l),T(k)] specified by g for k > m. Therefore, we can define 
the following counterfactuals. 

Let Y m be a subject's utility measured at the end of follow-up when the coun- 
terfactual intervention g m is followed. Similiarly, let BMI m (k) , L m (k), BMI m 
A g m (k) be a subject's BMI, covariate, maximum BMI and A— history through 
k under g m . Note BMI 9 m (k) e L m (k). Then we have the following formal 
definition. 

Definition of a general time m maximum weight gain dietary in- 
tervention regime g m : The subject follows his observed diet up to time 
m and from month m onwards, the subject is weighed every day: (i) if A (to) = 
BMI (m + 1) — BMI max (to) > g m [A(k — 1), L(k)] , the subject's caloric intake 
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is restricted until the subject's BMI falls to below BMI maK (m)+g m [A(k — 1), L{k)\ : 
(ii) for m+l< k < K if (a) A^k) = BMI^ (k + 1) - BM/« (fc) > 



^(fc-i)X(fc) 



the subject's caloric intake is restricted until the sub- 



ject's BMI falls to below SM/« max (fc)+5fc A^(fc - 1), L 9 m (k) ; (b) if his BMI 



is less than BMI^ max (k) + g k ^(fc - l),L 9 m (k) 
eat as he pleases without any intervention. 

Note, by definition, L g m {k) equals L m (k) and A g m {k — 1) equals A m [k — 1) 
for k < m. Furthermore, given a regime g =go, we say a subject's observed data 
is consistent with following the associated regime g m if and only if A^k) < 

gk A 9 m {k — l),L 9 m {k) for k > m. It follows that if a subject's observed data 
is consistent with following the associated regime g m , then subject's observed 
data is consistent with following the associated regime gk for any k > m. 

If for all k > m, gk [a(k — 1), T(fc)] is a constant a (fc) that does not depend on 
(a(k — 1), l(k)) , the regime g m is said to be non-dynamic or static and is writ- 
ten g m =a(m) . Otherwise it is dynamic. An intervention that allowed a BMI 
gain of 0.1/12 per month (i.e., of 1 per decade) starting at time (age 18) is the 
regime g =a(0) with each a (to) = 0.1/12. A dynamic intervention starting at 
time that allows a BMI gain of 0.1/12 per month in subjects free of hyperten- 
sion, diabetes, hyperlipidemia, or clinical CHD, but of only 0.05/12 per month 
once a subject developed one of these risk factors is a dynamic regime go with 
has gk [a(k — 1), l(k)] =0.1/12 if l(k) indicates a subject is free at k of hyperten- 
sion, diabetes, hyperlipidemia, or clinical CHD and gk [a(k — l),f(fc)] = .05/12 
otherwise. 

The expected value E [Y 9 ] is our parameter of interest associated with the 
regime g: the expected utility had we placed in 1950 all 18 year old non-smoking 
American men on the maximum weight gain intervention regime g. 

Let L^J denote the smallest integer less than or equal to t and define b + = b 
if b > and b + = if b < 0. Note because data is only obtained monthly, for any 
non-negative real number t, A (t) = ^4(L^J) and L (t) = L([t\). Given a regime 
g, let A" A (t) = [A(LtJ)-5 LtJ [A([t\ - l),i(L*J)]] + 

= [BMI{\t\+l)-{BMI m ^([t\)+g vn [A([t\ - l),L([t})]}] + so A 9 A (t) = 
for all t if and only if a subject's observed data is consistent with following 
regime g from time 0. When A 9 A (t) ^ 0, A 9 A (t) measures how much greater 
one's observed weight gain is than the maximum prescribed by g. Define 

f x 

X 9 m (V>) =m+ exp {lu (A 9 a (t) , A (t~) , L (t) ,ip)} dt if X > m (49) 

J m 

X 9 m (if,) = X if X < m (50) 

K 

Y? ((3,4,)=Y-J2 7m [A 9 A (m),A (m - 1) , L (m) , X m (V) , 0\ (51) 

m=j 

where the functions w (a (t) ,a(t~) ,l(t) , if) and j m (a (to) , a (to — 1) ,I(m) , if) 



the subject is allowed to 
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are again known functions satisfying u (a (t) , a (t ) , / (t) , ip) = if a (t) = or 
ip = and 7 m (a (to) , a (m — 1) , 1 (to) , [3) = if a (to) = or /3 = 0. 

Given a regime g, we say that (49)-(50) is a correctly specified SNFTM for 
Xjb and (51) is a correctly specified SNMM for Y^X-^ with true parameters 
{(3*,ip*) when there exists some (/3*,ip*) such that, for each to, 

Assumption (i): X^ and X^ (ip*) have the same conditional distribution 
given (A 9 A (j),A(j - 1) ,L(j)) and 

Assumption (ii): 

E [Y^\A a A (m),A(m -l),L{m) ,X 9 m = x] = E [Y£ (/3*, V*) \A A (m),A(m - l),L(m) ,X 9 m (p) = x] 

(52) 

Recall A(m — 1) is a function of L (to) and thus its appearance in the con- 
ditioning event is redundant. Define 

S 9 (to) = 1 BMJ (to + 1) > BMI max (to) + 5m [A(m - 1),L (m)] (53) 

so A^(to) > implies S s (to) = 1. 

Given a regime g, let the RC S assumption be the RC assumption but with 
Xfb, Y^, E 9 (to) replacing their counterparts without g and A 9 A replacing A. Let 
CD ff be the CD assumption but with X^ replacing X m and "time m dietary 
intervention" replaced by the "g m dietary intervention". Henceforth we assume 
the CD 9 and the RC 9 hold for all regimes g. 

Suppose we carry out g-estimation as in section 3 except with X^ (ip) , YJ^ ((3, ip) , S 9 (to) 
replacing replacing their counterparts without g and A A replacing A. Then re- 
sults of Robins (4) imply that, under the RC 9 and CD 9 assumptions, if the 
model 

E [A 9 A \L(m),A(m- 1),S 9 (to) = l] = a T W(m) 
is correct, and our SNFTM for X^ and SNMM for Y^X^ are correctly speci- 
fied, then ip, (3 (fy , and n" 1 £™ Y 9 (j3 (fy , ip) are CAN for ip*,0*, and the 

parameter of interest E [Yq ] respectively, provided (/3*,ip*) are identified and 
we choose Q m (/?) linear in Y m ((3) . 



6 Measurement Error 

In studies of the effect of a time-independent exposure, random exposure mea- 
surement error generally leads to bias towards the null and loss of power. How- 
ever, the consequences of random exposure measurement error are much more 
complex in longitudinal studies of a time-dependent exposure in the prescence 
of time- varying counfoundcrs. Specifically, in such a study, exposure history 
prior to time t needs to be considered as a potential confounder for the effect 
of exposure at t, even under the sharp null hypothesis of no causal effect of 
exposure at any time on the outcome Y. Since random measurement error in a 
confounder can cause bias in any direction, random error in recorded BMI can, 
in principle, cause bias even under the null! See Rcf (6). Futhermore this ran- 
dom error should be seen as including not only errrors in measurement of BMI 
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but also short term flucuations in BMI due to illness, a New Years resolution to 
loose weight, etc. These random fluctuations in BMI may have little effect on 
eventual mortality, but they can easily obscure the actual trend in someone's 
BMI for periods of up to a year. Thus if we use a monthly scale of analysis 
as described above, the random fluctuations in BMI may dominate any trend 
within a subject. Further given that past BMI must be controlled for in the 
regression models for current BMI used in g-estimation, the true correlation be- 
tween past and present BMI trends within a person will be obscured by random 
fluctuations, which can even result in bias away from the null. This can occur 
when the confounding effect of past trends in BMI are inadequately controlled 
due to the random mismeasurement in past BMI. What to do? 

One approach would be to specify a complex statistical model for the rela- 
tionship between true and mismeasured BMI. At present. I tend to seriously 
doubt the robustness of such an approach owing to inevitable model mispecifi- 
cation. 

The alternative is to increase the " time" between measurements used in the 
analysis from say 1 month up to as high as 5-6 years. By increasing the time 
between measurements, the problem of random fluctuations in BMI is markedly 
reduced, as the BMI signal (the true difference betwen measurement occassions) 
is made much greater, while the random fluctuations may not increase or may 
even decrease if the fluctuations are autocorrelated on a time scale of a few 
to many months. The drawback of increasing the "time" between measure- 
ments in the analysis is that this can lead to poorer control of the confounding 
attributable to evolving time-varying factors. As an example, because the tem- 
poral ordering of events between the measurement times used in the analysis is 
lost; the confounding effect of changes in exercise may be incorrectly attributed 
to a causal effect of BMI. 

At present I would recommend repeating one's analysis using a number of 
different between measurements "times" and report all results. In this way, the 
sensitivity of one's conclusions to the choice of the "time" between measurements 
will be known. If important, this sensitivity will stimulate further discusion and 
the development of better analytic methods. 

7 Appendix 1: 

7.1 A Formal Definition of a Joint SNFTM for X m and a 
SNMM for Y m \X m 

The definition here is the alternative, more intuitive and more general definition 
mentioned in the main text. The equivalence with the definitions in the main 
text are proved below. 

We first consider the uncensored case. The observed data is O = A (K) , L (K + 1) , X, Y, 
where A is a continuous time to event variable and Y is measured at K + 1 . 
The counterfactual data are (X m , Y m ) , m=0,...,K+l, denoting A and Y under 
treatment regimes where one experiences his observed treatment A (m — 1) up 
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to m and then receives no treatment (treatment level 0) thereafter. We make 
the assumption that Xk+i = X, Yk+i = Y. The covariate L (k) precedes A (k) 
which precedes L (k + 1) . 

The function 4, (x,L(m),A(m)) = S^ |X(m) _ (m) {s Xm+i]T(m);X{m) (x)} 

is a counterfactual conditional quantilc -quantile function, where S and S~ 1 de- 
note a survivor function and its inverse. It is a standard result that x^ {x, L (to) , A (to)) 
is the unique function for which X* n = x^ (X m+ i, L (to) , A (to)) and X m have 
the same conditional distribution, i.e., 

X^\L(m),A(m)~X m \L(m),A(m) (54) 

Define X^ K+1 = X and then recursively define X^ = x) m L (to) , A (m)j . 

Robins and Wasseman (7) proved the following 
Theorem Al: 

X m \L (to) , A (to) ~Xl \L (to) , A (to) (55) 

where we silently take such displays to hold for all m=0,...,K. 

Furthermore, Robins (8,10) and Lok (9) proved the function x^ is unique. 
That is if the above display holds for with X^ replaced by some H m — h m (H m+ \, L (to) , A < 
and Hk+i = X, then the function h m must be the function x^. 

A SNFTM for X m assumes 

x m (x m+ i,L (to) , A (to) ; tp ) = x^ m (X m+1 ,L (m) , A (m)) (56) 

for a known function x rn (x, L (to) , A (to) ; iji) satisfying x m (x, L (to) , A (to) , t/>) = 
x if ip = or A (to) = with ip an unknown parameter vector. 
It follows immediately that 

X m (V) \L(m) , A{m) ~ Xl\L (m) ,A (m) , (57) 

with X K+1 (V) = X and X m (V) = x m (x m+1 ) ,L(m),A (to) ; ip'^j 

(58) 

The uniqueness of x^ implies that SNFTMs as defined in the text are also 
SNFTMs as defined here. 

Recall X^ = x^ (X m+ i,L (to) , A (to)) and define 

~/l (A (m) ,L(m),x) = E [Y m+1 \A (to) ,L(m),X^ = x] -E [Y m \A (to) , L (m) , X m = x] 

(59) 

which is equivalent to 

E [Y m+1 - -fl (A (to) , L (to) , X* m ) \A (to) ,L(m),X* m = x] = E [Y m \A (to) , I (to) , X m = 

_(60) 

Define Y^ +1 = Y and then recursively define Y^ = Y^ +1 — 7^ (A (to) , L (to) , X^) . 
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Below we prove the following theorem. 
Theorem A2: 



E [Yl\L (to) ,A(m),Xl = x]=E [Y m \L (to) , A (m) , X m = x] 



(61) 



Furthermore the function 7 m is unique. That is if the above display holds 
with replaced by some H m = H m+ \ — h m (A (to) , L (to) , X m ) and Hk+i = 
Y, then the function h m must be the function 7 TO . 

An additive SNMM for Y m \X m assumes 



(A(m),L(m),x;0'^ = 7 m (A (to) , L (to) , x) 



(62) 



for a known function j m [A (to) , L (m) , x; 0) satisfying j m [A (to) , L (to) ,x;0) = 
if (3 — or A (to) = with (3* an unknown parameter vector. 
It follows immediately that 

£[y m (V,^) |I(TO),A(m),X m (v*) =x] =E[Y m \L(m),A(m),X m = x] , 

(63) 

with Y K+1 (j3' , V* ) = K and T m (/?* , ^* ) = T m+1 (/3* , V* ) - 7m (m) , I (to) , A m (V );/?*) 

(64) 

The uniqueness of 7^ implies that an additive SNMM for Y m \X m as defined 
in the text is equivalent to the additive SNMM for Y m \X m as defined here. 
Proof of Theorem A2: By backward induction. 

Case l:m=K; E 



Y^ K \L{K),A{K),X^ = x 
Y K+1 - 7^ (A (K) ,L(K), X* K ) \A (K) ,L(K),X] ( = x 
Y K+l - 7^ (A (K) ,L(K), X* K ) \A (K) ,L(K),X* K = x 
where the first equality uses the definition of Y^ and that Yk +1 



--E 
--E 



= E[Y K \A(K),L(K),X K = 



Y — Y^ 
1 — 1 K+n 



the second uses that X^ = X^ K by Xk+i = X = xj (+1 , and the third is the 
definition of 7^ (A (K) ,L(K), X* K ) . 

Case 2: Assume true for m. We prove true for m+1. 

We will require the following Lemma 

Lemma: 

/ (I (m + 1) , A (m + 1) \L (m) , A (m) , X ro = x) = f (L (to + 1) , A (m + 1) \L (m) , A (m) , X* m 
Proof: f(L(m+l),A(m + l)\L(m),A(m),Xl l = x) 

={/ (Xl = x\L (to) , A. (to)) } _1 I (L (to + 1) , A (to + 1) , X m - x\L (to) , A (to)) 
= {/ (X^ = x\L(m) JH)}" 1 x 

8x1-1 (x 'al TOM(ro)) / (L (to + 1) , A (to + 1) , Al +1 = x^ 1 (or, I (to) , A (to)) |L (to) , A (to)) 
= {./(A^=x|L(to),A(to))}- 1 
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dx 



f (L (to + 1) , A (to + 1) , X m+1 = xfc 1 (x, L(m),A (to)) \L (to) , A (m)) 

(m) , ^4 (m)) 



= E 



= {/_(x; n = x\L(m) , A (to)) } f(L (m + 1) , A (to + 1) , X* 
=f (L(m + l),A(m+ 1) \L (to) , A (m) , *t, = x) 

where the first equality is by Bayes rule, the second by X m and X m both hav- 
ing the same law as X m conditional on L (to) , A (to) and a change of variables 

from A^ toX^ +1 , the third by (x m+1 ,L(m + 1) ,3(m + 1)) and (X m+1 ,L(m + 1) ,A (m + 1)) 
having same joint distribution, the fourth by the definition of Xj^ and a change 
of variables, and the 5th by Bayes rule. 

Now to the proof of case 2: 

E[YX\L(m),A(mhXl=x\ 

l+i ~ ll (A (ro) , I (to) , Xl) \L (to) , A (to) , = x 

Yl+i\L(m + l),A(m+l),Xl\L (to) , A (m) , X m = x\ } - 7 t, (3 (to) , I (to) , z) 
, [Yi+i\L(m + 1) , A(m+ 1) , A^ +1 = .tJ^ 1 L (to) ,3 (to)) |I(to) ,3 (to) , A^ = z 
-7rn (-4 (to) , L(m) , x) 

=E {E[Y m+1 \L (to + 1) , A (to + 1) , A m+i = 47 1 (a;, L (to) , A (to))] |L (to) , A (to) , A+ n = .t 
~7^ M . Hm) , a;) 

=£ {E[Y m+1 \L (to + 1) , A (to + 1) , A m+1 = x^ 1 (x, L (to) , A (to))] |L (to) , A (to) , A,; = x 

-7rn M . L _M > ^) 

=E {E[Y m+1 \L (to + 1) , A (to + 1) , X m = x] \L (to) , A (to) , A r * n = x} 

=E [Y m+1 \L {m)_, A (to) , X* m = x] - ^ m (A (to) , L (to) , x) 
=E [Y m \L (to) ,A(m),X m = x], 

where the first equality is by the definition of Y m , the second by iterated 
expectations, the third by the definiton of X m , the fourth by the induction 
hypothesis, the fifth by the preceding Lemma, the sixth by the definition of 
Aj^,the 7th by the laws of probability, and the eighth by the definition of 
7^ (A (to) , L (to) , x) . Uniqueness is proved as in Refs (4,10) and is omitted. 



Additive SNMM for Y m \X m may not be not appropriate for analyzing cen- 
sored data due to adminstrative censoring of X at time K as discussed in the 
text. As indicated in Section 4, our approach requires that we consider a broader 
class of SNMM models which we now describe. 

Consider a collection of functions c m (x, A (to) , L (to)) indexed by to and de- 



(X m ,A (to) , L (to)) , Cl = c m (X m ,A (to) , L (to)) , C„ 



(X m ,A(m),L(m)) 



fine C* m = 

and Cm (tp) — 4n (X m ,A (to) , L (to) , tp) . For fixed A (to) , L (to) , c m (x, A (to) , L (to)) 
need not be a 1-1 function of x. The approach described in the text for handling 
right censoring of X at tiime K + 1 amounts to the selection of particular func- 
tions c m that guarantee that C m (ip) is an observable (i.e. uncensored) random 
variable. 

Let c m denote an arbitrary element in the range of c m . Redefine 
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7^ (A (to) , L (to) , c m ) = £ [F m+ i|A (m) , Z, (m) , C m = c m ] -E [Y m \A (to) ,L(m),C m = c m ] 

(65) 

which is equivalent to 

£ [Y m +i - ll (A (to) , L (to) , C£) | A (to) , I (to) , C£ = c m ] = £ [y OT | A (to) , I (m) , C m = c m ] . 

(66) 

Define Y^ +1 = Y and then recursively redefine Y m = Y m+1 — 7^ (A (to) , L (to) , C^) . 
Below we prove the following theorem. 
Theorem A3: Suppose, for to = 0, K — 1, 

c L+i ( x > A(m + l),L(m+ 1)) = d m+i [c m {a4 (x, A (to) , L (to)) , A (m) , L (to)} , A (m + 1) , L ( 

(67) 

for some function d m +i (c TO , A (to + 1) , L (to + 1)) , where x^ (x, 
is as defined previously. That is, the function c m+1 = d m+ \ o c^ m ox^ m . Then 

E [Y m \L (to) , A (to) , Cl = x] = £ [Y m \L (to) , A (to) , C m = c m ] (68) 

Furthermore the function 7^ is unique. That is, if the above display holds 
with Y m replaced by some H m = H m+ \ — h m (A (m) , L (to) , C m ) and Hk+i = 
Y, then the function h m must be the function 7^. 

Remark: The need for Eq 67 in the supposition to Theorem A. 3 is because 
the function c m (x, A (to) , L (to)) need not be a 1-1 function of x. 

An additive SNMM for Y m \C m assumes 

7 TO ( A (to) , L (m) , c m ; /?* ) = 7^ (A (m) , L (to) , c m ) (69) 

for a known function 7 m (A (to) , L (to) , c m ; /?) satisfying 7 m (^4 (to) , L (to) , c m ; /3) = 
if /3 = or A (m) = with /?* an unknown parameter vector. 
It follows immediately that 

£[r m (/3\V*) |X(TO),3(TO),C m (y) =z] =S[y m |I(m),3(m),C m = a:] , 

(70) 

with Y K +1 (/?* , ^* ) = Y and F m (/?' , V*) = W (/3* , V*) - 7™ (m) , I (to) , C m (V );/?*) 

(71) 

Proof of A. 3 : We only describe where the proof differs from that of its special 
case Theorem A. 2. The proof is essentially identical except for the replacement 
of X m by C m , x by c m , and x^ 1 (x, L(m) ,A (to)) by 

<L+i{ x m 1 (4r 1 (c m , I (to), A (to)) , I (to), A (to)) ,L(to + 1),A(to + 1)} 

(72) 

The only problematic point is that, now, since x^ * (cj^ * (cm> (to) , ^4 (to)) , -L (to) , A (to)) 
is the subset X (c m , L (to) , A (to)) = {a; : (x^ (x, L (to) , A (to)) , L (to) , A (to)) = c m } 
of the nonnegative real line, a necessary condition for 
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c L+i H« 1 (4r 1 (cm. £ (m) , 4 (m)) , L (to) , A (to)) , Z, (to + 1) , -4 (to_+ 1)} to 
be a well denned function is that every element of the set X (c m , L (to) , A (to)) 
has the same image under cj ra+1 (•, L (to + 1) , A (m + 1)) . The choice 

c^ +1 (-,L(m + 1) ,3(m + 1)) = 4, {4, (■, A (to) ,I(to)) , A (to) ,L (to)} sat- 
isfies this constraint with the image being c m itself. More generally, the choice 
of c] n+1 (•, L (to + 1) , A (to + 1)) given in Eq 67 satisifies the constraint. 

8 Appendix 2: Estimation of Effects with the 
Parametric G-formula and IPTW When a Suf- 
ficiently Long Minimal Latent Period Exists: 

In this section we show that the the parametric G-formula and IPTW can 
be used to estimate certain causal effects when their exists a sufficiently long 
minimal latent period. We begin with a preliminary discussion of these two 
methods of estimation. 

8.1 Preliminaries: 

In this preliminary discussion, we assume that, as in Section 3.1.1, there is 
neither confounding by pre-clinical disease nor a minimal latent period. Specif- 
ically we assume, for each regime g, the CO 9 assumption that, for each j, 
(Y 9 , X 9 ) II A a A (j) \L (j) ,A 9 A (j-l) = (j - 1) , 5» (j) = 1 holds, with E 9 (to) 
defined in Equation (53). 

Recoding: Without loss of generality, we henceforth redefine (ie recode) 
L (j) such that El 3 (j) is now one of the components of L (j) but we remove from 
L (j) the components corresponding to X, ie the components (XI (X < j) ,1 (X < j)) . 
Then we can write the CO 9 assumption as 

CO 9 : (Yf, X 9 )UA 9 A (j) \L (j) , A 9 A (j - 1) = (j 1) , (XI (X < j) ,I(X< j)) 

(73) 

since, from their definitions, ^ 9 (j) = implies A 9 A (j) = 0. The CO 9 assumption 
implies 

(Y 9 , X 9 )UA 9 A (j) \L (j) ,A 9 A (j - 1) = (j - 1) , (XI (X < j) ,I(X< j)) (74) 

since A g A (j - 1) = 0(j - 1) implies (Yf,X 9 ) = (Y 9 ,X 9 ). This last display is 

the standard definiton of no unmeasured confounding given (L (j) , (XI (X < j) ,1 (X < j))) 

for the effect of A 9 A (j) on the counterfactuals Yq , X^. Let \(u\-) = \\mh-*aW [u < X < u + h\-,u < X] /h 

be the conditional hazard of X given •. 

Robins (14,15) proves that Eq 74 implies that S X 3 (u) = pr (Xq > u) is 
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identified via 



Sx* («) 



/■■■/ cxp {~/ A(t|L(t),Ai(t) = 



m=0 



]~l dF L (m) |L (m - 1) , (m - 1) = 0, X > 



dt )■ x 



rn 



(75) 



= £ [j {X > u} I { A A (u) = o} W 9 <* (u) 



m=0 



*(«) = !/ ]"[ A^(m)=0|L(m),Al(m-l),X>m 



(76) 



(77) 



where the first formula for S'xf (w) is referred to as the g-computation algorithm 
formula (g-formula, for short*] and the second formula as the IPTW formula. 
To shorten the formulae we have written as shorthand for (t) when the time 
t is clear. In fact Robins (14, 15) shows that the assumption 



X{> II A% (j) \L (j) , A A (j - 1) = (j - 1) , X > j, 



(78) 



which is implied by the assumption of Eq (74), suffices to establish the identify- 
ing formulae. To estimate S x s (u) we can using either the parametric g-formula 

estimator that replaces the unknowns A (t\L (t) , A 9 A (t) = (]j and / L (m) \L (m — 1) , A 9 A (m — 1) = 0, X > m 
in the first formula by estimates based on parametric models or the IPTW esti- 



mator that replaces the unknown pr 



A\ (r 



0\L(m 



l),A 9 A (m 



l) = 0,X>m 



in the second formula with a parametric estimate and the unknown expectation 
with a sample average. Both approaches are alternatives to g-estimation of 
structural nested models. 

Robins (14,15) proves E [YJf ] is identified under the assumption of Eq (74) 
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by 



K+l 



dxx 



E[Y Q 3 } = 

/■■■/ Ax(^|I(x),Ai(x) = o)cxp|-^ X x (t\L(t),A 9 A (t)=0 



dt \ x 



m— \_x\ 

n ^ 

m=0 

n ^ 

E 



L (to) |L (to - 1) , (m - 1) = 0, X > to 

L{m) \L(m-l),A A (to - 1) = 0,X = x 

Y\L(K+1),A 9 A (K) = 0,X 

= E [y/ { (Jf) = o} W 9 ^* 
with 

K 



(X){l/ J] Vr 

m=\X + \\ 



A a A (to) = Q\L (to) , (to - 1) = 0, X, X < m 



In the above formulae, we have assumed for simplicity that X has support 
on (0.K + 1) so censoring for X is absent. 

We next consider whether S x s (u) and E [Yq] remain identified in the pres- 
cence of confounding by pre-clinical disease and a sufficiently long minimal latent 
period (MLP). 

8.2 Identification and Estimation of S X 3 (u) : 

The following theorem establishes the identification of S X [> (u) . First note under 
our recoding, the RC 9 assumption becomes 

RC° : (Yf, X 9 ) UA A (j) \L (j) , A 9 A (j -1),U (j) = 0, (XI (X <j),I(X< j)) , 

(79) 

Theorem A4: Given a regime g, let a g-specific MLP satisfy the definition 
of a MLP of Sec. 3.2.1 except with Xk and X m replaced by by X% and X^ and 
A (to) replaced by A 9 A (to) . Suppose A 9 A (to) has a g — specif 'ic MLP of x months 
for its effect on X where x exceeds the time <r in the CD 9 assumption. Then, 
under the CD 9 and RC 9 assumptions, S^s (u) remains identified by both the g- 
formula and the IPTW formula when the recoded L (t) and A 9 A (t) are redefined 
as it (t) an d A 9 A A (t) where 



Lt (t) = L(t-x),A A A (t) = A 9 * (t- x ) 



(80) 



The theorem thus states that the identifying formulas are the usual g-formula 
and IPTW formula except we replace both the treatment variable A 9 A (t) and 
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the covariate variable (t) by their values X time units earlier. [For the IPTW 
formula the transformation is applied to W 9 '* (u) .] It is important to emphasize 
that a similiar transformation is not applied to X. Thus the conditioning event 
L (to — 1) , A A (to — 1) = 0, X > m transforms to L (m — 1 — x) , A A ( m ~ 1 ~ x) = 
0, X > to. 

Proof of Theorem: It suffices to show Eq. (78) holds when L (t) and A 9 A (t) 
arereplaced by Lt (t) and A B * (t) . By RC 9 , (Yf,X?) UA% (j) \L (j) , A 9 A (j - 1) = 
0, U (j) = 0, X > j. Thus, (Yf, X?) II A% (j) \L (j) , A 9 A (j- 1) = 0, U (j) = 
0,X> j, X] > j + X- By CD 9 and X > q, (Yf,X 9 ) UA 9 A (j) \L (j) , A 9 A (j - 1) = 

•'•■V; •./ • \..v -j. 

Thus (Yl_ x ,X a m _ x ) UA A (m-x)\L(m-x),A 9 A (m- X -l) = 0,X > 
(to — x) , ^m-x > 171 m = X + j- 

Now the event X > (m — x) is the event X^ n _ x > (to — X ) -Further, by x a 
g-specific minimal latent period we also have the event X^_ x > to is the event 
X > m. Thus we have (Y£_ x , X^_ x ) UA A (m- X ) \L(m - X ) ,A 9 A (m - X ) = 
(to - x) , X > to. Since, given A A (to - x) = (to - x - 1) , we have (Y^_ x , X 9 m _ x ) = 
(Y 9 , X 9 ) , we conclude (Y 9 , X 9 ) II 4» (m - x) \L (m - x) , (m - x - 1) = 
0(m — X ) ,X > m, which is exactly Eq. (78) with L (t) and A 9 A (t) replaced by 
Z,t ( t ) and A 9 A A (t) , proving the theorem. 

In contrast, under the conditions of the previous theorem, E [Yq] is not 
identified because Eq. (74) , in contrast to Eq. (78) , fails to hold when L (t) 
and A 9 A (t) are replaced by (t) and A 9 A (t) . Specifically, Eq. (74) can be 
written as the conjunction of Eq (78), 

(Y 9 , X 9 ) II A 9 A (to) |I (to) , A A (to - 1) = 0, X, to > X > m - X + <r (81) 

and 

(Y 9 , X 9 ) II A 9 A (to) \L (m) , A 9 A (to - 1) = 0, X, m - X + ? > X (82) 

Below we show that under the conditions of the previous Theorem, Eq.(81) 
holds but Eq (82) does not when L (t) and A 9 A (t) are replaced by (t) and 
A 9 A (t) . To show (81) we modify slightly the proof of eq (78) as follows: 

(Yf, X 9 ) U A 9 A (j) \L (j)_,A 9 A (j - 1) '= 0, U (j)_= 0, X > j (by RC 9 ) 

(Y 9 ,X 9 ) II A 9 A (j) \L (j) , A 9 A (j -1) = 0,U (j) =0,X> j, X 9 , j + < < 
X 9 <J+ X 

=► (Yf, X 9 ) U A 9 A (j) \L (j) , A 9 A (j - 1) = 0, X > j, X 9 ,j + c < X 9 < j + x 
(by CD 9 ) 

=► (Y^_ x , X 9 m _ x )UA 9 A (m- X )\L(m- X ),A 9 A (m- X -l)=0,X>(m- X ), X 9 n _ x ,m > 
X 9 m _ x > to - x + <T- 

^ (Y 9 ,X 9 )UA 9 A (to - X ) \L (to - x) , A 9 A (m - X - 1) = 0, X^_ x > (to - X ) , X 9 m _ x , to > 
X L- X > m- X + q 

^ (y 9 ,X 9 ) n^(m-x)|I(m-x),3i(TO-x-l) = 0,X,m > X > 
m — x + ? by the g — specific MLP assumption. 
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The proof of (82) fails because the event L (j) , A 9 A (j — 1) = 0, U (j) = 
0, X > j, Xj, X 9 < j + c is not the same event as L (j) , A 9 A (j — 1) = 0, U (j) = 
0, X > j,X 9 ,X 9 < j + <; under CD 9 because X 9 - < j + <r docs not imply 
U(j) = 0. 

Proof that E [Yj] is nonparametrically identified when a sufficiently 
long MLP exists . In Section 3.3, we stated that E [Yj] is nonparametrically 
identified under the conditions of the previous theorem with the regime g in the 
theorem being the regime that always assigns exposure zero. A proof follows. 

Let IN, A 1 , S T , YJi, X^ be as defined in Section 3.3 where we recall that 
because of the existence of the MLP of length x > ^ au subjects with ^ < 
X m < m + c have IN(m) = 1. First in Eqs 78, 81, 82 we replace (Y 9 ,X 9 ) 
by (Yq T , Xq ) , A 9 A (m) by A T (m — x) > an d redefine L (to) as L (to — \) with the 
component 5 (to) of L (to) being replaced by S T (to — x) • Eq 82 now holds 
trivially because with probability one to — \ + ? > X implies IN (m — x) = 1 
and thus S T (to — \) — and ^4 T (to — x) = 0. Furthermore the proofs of Eqs 78 
and 81 go through as above with only minor notational changes. We therefore 
conclude that Eq 74 holds and thus that E [Yj] is nonparametrically identified. 
The identifying IPTW formula is explicitly given by 



E [y T ] = E [YI {A T (K- X ) = 0} W 9 '*] 



m =yx\ 

n 



pr 



A T (to - x) - 0|L (to - X ) , A 1 (to - x - 1) = 0, X > m 




Ai (m-x)=0\L(m- X ),A 1 (m - * - 1) = 0, X 



9 Appendix 3: Optimal Regime Models : 

Suppose we now wish to estimate the regime g op t that maximizes E [Y ( 9 ] over 
all regimes g of the previous subsection. We will do so by specifying an optimal 
regime stuctural nested mean model and associated SNFTM. 

To begin consider the dietary intervention a(k) ,g opt k+1 in which one fol- 
lows there observed diet up to month k, has a BMI increase of a (fc) over there 
maximum previous BMI in month k, and follows the unknown optimal regime 
g opt thereafter. Let Y alykS> '-°p t . k + 1 , -°p*,<=+i be the associated counterfactu- 

als. When A (k) — a (k) , write g for the regime A (k) , g . Note 

— Optj/C-pJ. — Opt _L 

X-opt,K+l = X. 

We will make the following assumptions: 

Optimal regime RC Assumption : A (to) is statistically independent of 

^ Y <rn),g opttm+ ^ x a(rn),g opt m+ X given S ( m ) = 1, 1( m ), A (m - 1) and U (to) = 
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(m) for each a (to) > 

Optimal Regime CD Assumption: 

X^opt.m > m + £ =>U(m) = 0(m) (83) 

We next recursively define the random variables x"' m '-opt,m+i by t nc 
relationship that X-°p*. K + 1 (ip) — X and, for m = K, ...,0. 

X 0(m) ^ P t, m+ i (ip)= m + exp {co (a (m) , A (to - 1) , L (to) , V>) } (V (m) '^p*.™+i (V>) - to) 

if < X a(m) '^>-"+i - m < 1 
X 0(m) '^ P t, m+ i (^,) = x a(m) '^^+i (</>) + {exp {uj (a(m) , A (to - 1) ,L(m) , ip)} - l} 

i/ KI a(m) ^^+i (V>) - m 

i/ X a(m) '^-+i (V) < to, 

These equations recursively define X alja ^ -°pt,"»+i (^) m terms of the observed 
data, the regime g^ t m+1 and the parameter vector ip as can be verified by noting 

that these equations imply the following relationship between X a ^ m>> '-°p t - m + 1 (ip) 
and X-°p*.™+ 1 (ip) . 

v a(m), g , exp {oj (A (m) , A (m - 1) , L (m) , ip) } g 

exp {w [a (to) , A (m — 1) , L (to) , ) 

if < (ip)-m<l, < X a(m) '^p*>™+i (V>) < 1 



X >• "2.opt,m + l 

= (V 1 ) + exp { uj (A (to) ,A(m-l) ,L (to) , V>) } - exp { w (a (m) , A (m - 1) , L (m) , -0) } 

i/ l<x a(m) '^p*.™+i (V>) - m, l<X^*.-+i (^) _ m 



X a( " l) '^ pt , m+ : 



(V>) = m + exp (A (m),A(m-l),L (to) , 0) } [xk pt ' m+1 {ip) - rnj 
+1 — exp {uj (a (m) , ^4 (m — 1) , L (to) , ip) } 



if o < xi° pt - m+1 (ip) — to < i, i< x a(m) ^p*.-+i - to 



(m);S { [exp {a; (A (to) ,A(m-l),L (to) , } - l] + (g^m+i (VQ - to) } 

X _o P t, m +l — m _)_ i i _ U _J i ^ L-L 

exp [uj (a (to) , A (m — 1) , L (to) , ip) ) 

if o < x a(m) '^p*>-+i - to (v>) < i, i< x^vt, m +i _ m (^,) 
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We next assume an optimal regime SNFTM given by 

X a(m) ^ pt , m+1 ^*) = X a(m) ' 9 -oP*,™+iwpl (84) 

for an unknown value ip* of the vector ip. 

We also assume the optimal regime SNMM 

7 TO [a(k) ,a(m - 1) ,I(m) ,x,(3*] = (85) 
E \ Y a{k) ' 9 -o^\L m = l m , A m = a m ,X m ^ Pt , k+ i (V-*) = x 



y°( fe )^ Pt , fc+1 |i m =7m) A m = a m ,X 0(fc) ^^ +1 = x 



Above u {a{t) ,a{t — 1) ,1 {i) ,tf) and 7 m (a (m) , a (m — 1) , 7 (to) , if) are known 
functions j m [a (k) , a (to — 1) , 7 (to) , x, /?] satisfying cj (a (t) , a (i — 1) , I (£) , if) = 
if a (t) = or V = and j m (a (to) , a (to — 1) , 7 (to) , /?) = if a (m) = or 
/? = 0. 

Recall the optimal regime itself remains unknown. However we show below 
that the following algorithm evaluated at the true (/3*,ip*) would find the op- 
timal regime g opt under the following additional condition, we we henceforth 
assume to hold. 

Additional Condition : For each a (m — 1) ,7 (to) , x, (3, m the function 
7m * [a {k) ,a (m — 1) ,1 (m) , x, 0\ is either everywhere zero or is strictly concave 
in a(fc) on the support of A (k) . 

Optimal Regime Algorithm: 

Given any {(3,ip) , calculate g op t,(f3,i>) = {9o P t(p,i>),m [a(m),7(m))] ;ra = K, ...,0} 
as follows. 

Calculate X 0{K) '^^^ {if) . Define 

9* op t {M ),K [A{K-1),L{K))} 

= I{X<K) arg max [>y K {a{K), A {K - 1) , L (K) , X, 0)] + I {X > K) x 

a(K) 



arg max E 

a(K) 



1K [a {K) , A {K - 1) , L {K) , X 0{K) '**p*.*+>- (V) , fi} \A {K - 1) , L (K) , X > k] 

Calculate g op t(/3,i>),K P {K - 1) , L {K))] = min j A (K) , g* pt ^^) tK [A{K-1),L(K))}} 

Calculate X^wm.k ^) = ^w.ftif^W.M'f)]^,^, ^ . 
Recursively for to = K — 1, 0, calculate 



9*o P t(0,4,),m P(to-1),L(to))] 
%) arg max [7 m {. 

a(m) 

7m | a (to) , A (to - 1) , I (to) , X° (m) '^p t( 3,*),m+i (^,) , /3| |3 (to - 1) , I (to) , X > to 



J(X < to) argmaxS [7 m {a(m) , A (to - 1) , L (to) , X, /?}] + / (X > to) x 

a(m) 



arg max 

a(m) 
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Calculate g op t(0,4>),m [A (to) , L (m))] = min | A (to) , g* pt{l3ip)m \A(m-l),L (to 
Calculate X^w.*>.™ ty) - i»»^,*),»R™),lH)],?, p(]m+1 ^ 
Note to carry out this algorithm we will need to be able to estimate 
E [ 7m {a (to) ,A(m - 1) ,L (m) , I° W 'V(M),»ti (^,) , \A(m - 1) ,L (m) , X > 

for all possible values of a (to) in the support of A (to) . One possibility is to spec- 
ify and fit an appropriate multivariate regression model with the possible values 
of a (to) indexing the multivariate outcomes at time m. 

To understand why this is the correct algorithm, we first note that any regime 
at to can be a function of X only if X < to, so that X is known by to. When X > 
to, we must average over X°' m ^-»pt((J,*),m+i (ip) because X ' m '-°p t « J '*)> ro + 1 (i/>) 
is a function of X. When X > to, J a ' m '4pt((j,+), m +i w jH De the value of 
X-optw,<i>),m (.0) jf the regime g op t(/3,4>) dictates the exposure a (to) . The optimal 
regime will choose the a (to) that optimizes the contribution to the utility at 
time to. But the optimizing a (to) depends on the a (k) chosen for the regime 
for k > to. Thus we need to use backward recursion to estimate the optimal 
regime. 

To be more specific consider the subgroup of subjects with a history 
(A(K -1) ,L(K),X) with X < K so X = (^*) eL(K). Then 

a ( K ) = 9* op t[ ^),K PC^ - that maximizes 

7k {a(if ), A (if — 1) , L (K) ,X,/3*} is clearly the optimal treatment choice at 
K. However, we are only considering regimes (interventions) that do not force 
subjects to gain weight. We now argue that for any subject with A (K) less than 
9*opt(p ip) k \A ~ 1) i L (K))\ , the optimal decison is not to intervene at all, 
so the subject receives his observed treatment A{K). The subject with A (K) 
less than g* pt ,p ^ K \A (K — 1) , L (K))] could still have received any treatment 
between and A (K) . However among this set of treatments, the treatment 
A (K) is optimal by Condition a) above. 

Next consider the subgroup of subjects with a history (A (K — 1) , L (K) , X) 

with X > K so X £ L(K) and X 0(K) ' £ °*>*.*+i ($*) > R. To find the op- 
timal treatment we average over X ^ K ^-°p t . K + 1 (ip*). Since the average over 
X°( K ^-o P t,K+i (ip*) f a function that is concave in a(K) for every possible 
value of X -°p t < K + 1 (ip*) remains a concave function of a (K) , we again take 
9o P t (M ),K [A(K-i),L (K))} = min { A (K) , g* opt(p .^ K [A (K - 1) , L (K))] } . 

That the same argument holds for each to is a standard dynamic program- 
ming argument as discussed in Robins (2004). 

Since (/3*,ip*) are unknown we must estimate them by g-estimation. Define 

K 

Y*rw-*> ((3, iP)=Y-Y J lm [A(m),A(m - 1) ,L(m) , X°°^ (iP) , 0\ . 

m 

Note these equations are much more complex than the equations for ip using 
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a SNFTM and SNMM for a fixed g in that g opt is now not known but depends 
on the parameters (/?, ip) through the above algorithm for g ov t,(P^)- Thus we can 
no longer estimate 0* independently of (3* since Im 1 "'"'* 1 (0) is now a function 
of (3 as well as through its dependence on g op t(/3,ip)- Rather, we must solve 
both pairs of g-estimation equations simultaneously. 

Specifically given the optimal regime RC and CD assumptions, to obtain 
CAN estimators of the unknown parameters, we find jointly (j3, ipj so that both 

the score test for the covariate vector depending on Im P '" J, * ) (-0) is precisely 
zero and the score test for the covariate vector depending on Ym^ 13 '^ (/?, 0) 
is precisely zero (both tests restricted to subjects with Im !>t( ' J, ' ,) (ip) > £ and 
S (to) = 1.) This turns out to be a very difficult computational problem. Robins 
(4) describes a number of computational simplifications, but they are beyond 
the scope of the current paper. Finally we obtain g opt ^^ as our estimate of 

the optimal regime g opt (p* ,,/,*) and n^ 1 J2i Yq""*^'*^ as our estimate 

of the expected utility E [V 9opt ] under the optimal regime. 

Both estimation of E [Yq] for a known g and of E [Y 9opt ] can be modified 
to allow for censoring at end of follow-up at K + 1 and for intactable unmea- 
sured confounding in certain subgroups using methods exactly analogous to the 
methods for estimation of E [Y ]. 



References 

[1] Willet W. (2000) NEJM 

[2] Robins JM, Hcrnan MA, Siebert U (2004). Effects of multiple interven- 
tions. In: Comparative Quantification of Health Risks: Global and Regional 
Burden of Disease Attributable to Selected Major Risk Factors Vol I. Ez- 
zati M, Lopez AD, Rodgers A, Murray CJL, eds. Geneva: World Health 
Organization. 2191-2230. 

[3] Robins JM. (1994). Correcting for non-compliance in randomized tri- 
als using structural nested mean models. Communications in Statistics, 
23(8):2379-2412... 

[4] Robins JM (2004) . Optimal structural nested models for optimal sequential 
decisions. In Proceedings of the Second Seattle Symposium on Biostatistics. 
Lin DY, Heagerty P, eds. New York: Springer. 

[5] Robins JM. (1997). Causal inference from complex logitudinal data. La- 
tent Variable Modeling and Applications to Causality. Lecture Notes in 
Statistics (120), M. Berkane, Editor. NY: Springe Verlag, 69-117. 

[6] Robins JM (2003). General methodological considerations Journal of 
Econometrics, 112(2003): 89-106. 



54 



[7] Robins JM, Wasserman L. (1997). Estimation of Effects of Sequential Treat- 
ments by Reparameterizing Directed Acyclic Graphs. Proceedings of the 
Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence 
Rhode Island, August 1-3, 1997. Dan Geiger and Prakash Shenoy (Eds.), 
Morgan Kaufmann, San Francisco, pp. 409-420 

[8] Robins JM, Scharfstein D, Rotnitzky A. (1999). Sensitivity Analysis for 
Selection Bias and Unmeasured Confounding in Missing Data and Causal 
Inference Models. In: Statistical Models in Epidemiology: The En- 
vironment and Clinical Trials. Halloran, M.E. and Berry, D., eds. NY: 
Springer- Verlag, pp. 1-94 

[9] Robins JM. (1999). Testing and estimation of direct effects by reparame- 
terizing directed acyclic graphs with structural nested models. In: Com- 
putation, Causation, and Discovery. Eds. C. Glymour and G. Cooper. 
Menlo Park, CA, Cambridge, MA: AAAI Press/The MIT Press, pp. 349- 
405. 

[10] Lok JJ, Gill RD, van der Vaart AW, Robins JM. (2001) Estimating the 
causal effect of a time- varying treatment on time-to-event using structural 
nested failure time models. Statistica Necrlandica. 58:271-295. 

[11] Murphy SA (2003). Optimal dynamic treatment regimes. Journal of the 
Royal Statistical Society, Series B 65, 331-366. 

[12] Robins JM , Greenland S. (1992). Identifiability and exchangeability for 
direct and indirect effects. Epidemiology, 3:143-155. 

[13] Hernan MA, Hernandez Diaz S. and Robins JM (2004). A structural ap- 
proach to selection bias. Epidemiology, 15: 615-625. 

[14] Robins JM. (1986) Addendum to "A new approach to causal inference in 
mortality studies with sustained exposure periods - Application to control 
of the healthy worker survivor effect." Computers and Mathematics with 
Applications, 14:923-945. 

[15] Robins JM. (1999). Association, causation, and marginal structural models. 
Synthese, 121:151-179. 

[16] Joffe MM, Hoover DR, Jacobson LP, Kingsley L, Chmiel JS, Fischer BR, 
Robins JM (1998). Estimating the effect of Ziduvodine on Kaposi's sar- 
coma from observational data using a rank preserving failure time model. 
Statistics in Medicine. 17:1073-1102. 



55 



